Today we address some accountability improvements to the XML Global Substitution web application we last left off with at Spreadsheet and XML Global Substitution Genericization Tutorial. You may recall, then, that we had a two step procedure of โฆ
- From filespec create CSV mapping (global_substitution.csv) for second part via section tags containing role=โparagraphโ and old_label=โ[OldLabel]โ and label=โ[NewLabel]โ all before section tag end
- From input index file and CSV above create Korn Shell (ksh) file that runs โฆ ksh -x global_substitution.ksh โฆ to update for new index file (but backup yourself first)
โฆ where, if that sounds a bit new to you, that is because we bit the bullet, and renamed the web application with the more apt name โglobal_substitutionโ wherever, earlier on, we had the name โtr_mappingโ. And this is a fairly big topic regarding โaccountabilityโ. To us, โaccountabilityโ is mainly about being able to revisit some software, work out what it does, and be confident to use it when and if and how it is applicable to an issue you have and/or a job you want done.
โAccountabilityโ with respect to any application goes hand in hand with listening to the client as to how their life can be made easier because of your applicationโs workings. With our XML โglobal substitutionโ work weโve had quite a few user driven changes of recent times, and we decided to assemble them into a blog posting today. First off, what files are involved as inputs, where bold parts are new since that last incarnation โฆ
- โFilespec filesโ: File specification for input XML data files โฆ whose data can be used to populate CSV and KSH files
- โIndex fileโ: File name for input (and output) index XML file
- โCSV fileโ: This file, previously, was mainly thought of as an intermediate or output file, but today we focus more on it being an input file โฆ to quote the code today โฆ
Allow for jobs where supplied with index file and intermediate CSV and a โhalfway prefixโ to search for in CSV file, but no KSH file yet
And what are the intermediate and/or output files?
- โCSV fileโ: as mentioned above
- โKSH fileโ: Our Korn Shell scripting โIndex fileโ amender is created via โFilespec filesโ processing or, as of today, via reading the โCSV Fileโ as an input file โฆ and now we encourage its use more often to improve the reporting functionality talked about below
- โHTML fileโ: COM_.htm report file using the talents of the HTML div element (remember HTML Textarea and Div Talents Primer Tutorial?)
- โTXT fileโ: New โsanity checkโ extracting files of contents of optionally designated tag of interest
- โIndex fileโ: File name for input (and output) index XML file
Sound a bit familiar? Well, yes, remember with XML Subtraction and Addition Genericization Tutorial how we said โฆ
Both above were challenging, and with the second we were glad we could call on โฆ
- HTML Textarea and Div Talents Primer Tutorial proved, at least to us, that HTML div elements had to be involved in some way shape or form โฆ along with the web application user experience (UX) thoughts of โฆ
- HTML5 Details Summary Primer Tutorial taught us, recently, about a great new HTML5 reveal CSS styling idea
โฆ and we also call on the HTML textarea element for improvements today.
Have been told that those โHTML fileโ and โTXT fileโ seal the deal as far as โfeeling goodโ regarding the data, which is important, of course.
Another user driven improvement with the โFilespec filesโ processing (to create that โCSV fileโ) we add more data intelligence via โฆ
- optional โFilespec filesโ first label in the โCSV fileโ first column via new โNew label regarding global substitutions start delimitationโ [ uci=โ] and โNew label regarding global substitutions end delimitationโ [โ]
- optional โFilespec filesโ title in the โCSV fileโ last column via new โAfter Find Title (CSV last field) tag prefixโ [>] and โ After Find Title (CSV last field) tag suffix [</title>]
A couple of new options that you might expect as an HTML input type=checkbox element are, instead, today, handled by clicking a link for those decisions โฆ
- if โKSH fileโ exists ahead of a โฆ
Allow for jobs where supplied with index file and intermediate CSV and a โhalfway prefixโ to search for in CSV file, but no KSH file yet
โฆ type of execution, we allow the user to delete that โKSH fileโ ahead of the run
- we used to demand the existence of both oldlabel and label attributes โฆ but we allow for just label existant data as the default but if the user does not like this idea, they can go back to the original idea that both oldlabel and label have to exist in the XML data
All very โbusiness logicโ feeling, but there you are, businesses know the way they want the data to work for them.
Here are the downloadable PHP global_substitutionphp (changed thisway based on that previous tr_mapping.php incarnation). Maybe you can adapt it to something you are working at with XML data, and you can see us using it with todayโs PDF
slideshow presentation?
Previous relevant Spreadsheet and XML Global Substitution Genericization Tutorial is shown below.
Itโs one thing to write a useful one off web application with quite a few hard codings, but what about an attempt to genericize it, and by so doing, oftentimes you are improving its documentation aspects, so that, if the code is revisited years later โฆ
- its generic qualities will be plain to all โฆ and at the same time โฆ
- it will be far easier to imagine as far as inputs are concerned โฆ
- the user can (still) break the job up
- the user has less to worry about as far as a backup of data goes
- it does not feel like a one off any more
- it is less likely to be ill used for an inapplicable application
โฆ so that, all in all, we feel much more confident such code can last the test of time and usefulness into the future than yesterdayโs (albeit useful) one off feeling version of the code you can see at Spreadsheet and XML Global Substitution CSV Tutorial as shown below.
Whatโs the main driver of genericization, in our book (but not our pamphlettes) for small jobs?
- turn all hardcodings you can into parameterizable variables, and today that is via PHP $_GET[] variables off the web browser address bar
- allow the user to change these, as the hardcodings just become defaults, and are presented in a submittable HTML form whose action is to recall the same piece of PHP software (code)
โฆ simple, huh?!
Again, you can see the various aspects of this, in play, with todayโs tutorial picture, and though it is not much use to run the PHP code live, its style is far more generic now, so we want to share tr_mappingphp (changed thisway) with you for your perusal โฆ just in case (it is of use for you).
Previous relevant Spreadsheet and XML Global Substitution CSV Tutorial is shown below.
Programmatically, we came in half way with the programming when we presented Spreadsheet and XML Global Substitution Primer Tutorial as shown below. The programming, then, had two inputs, namely โฆ
- input spreadsheetโs CSV file manually created
- index XML file
โฆ but that CSV file can be programmatically created rather than manually created. And while weโre at programmatically creating the CSV we could also programmatically create the Korn Shell (ksh) easier there too, with the same program, rather than using TextWranglerโs Grep (RegEx) talents โฆ not that weโre ungrateful or anything โฆ but it is good to mix things up to improve procedures sometimes.
And what programming language can we use, and what environment for that programming code? We think โฆ
- coding wise, weโll use PHP (starring PHPโs glob() method) โฆ and the environment for that will be that โฆ
- weโll use a (local Apache/PHP/MySql web server) MAMP subfolder (ie. how desktop application โcan meetโ web application) off its Document Root (/Applications/MAMP/htdocs/) โฆ /Applications/MAMP/htdocs/tr_mapping/ โฆ to store the XML data files (no CSV needed as input this way, as it will be programmatically created in part 1 of 2 parts to the whole job) โฆ which becomes accessible in two ways โฆ
- //localhost:8888/tr_mapping/tr_mapping.php#in_a_web_browser
- at Mac OS X Terminal desktop application command line via โฆ
cd /Applications/MAMP/htdocs/tr_mapping
ksh -x tr_mapping.ksh
Again, email is the conduit for both sides of โฆ
- input in
- output out
โฆ to complete proceedings. You can see the various aspects of this, in play, with todayโs tutorial picture, and though it is not much use to run the PHP code live, its style is leaning towards the generic side enough for us to want to share tr_mappingphp with you for your perusal โฆ just in case (it is of use for you).
Previous relevant Spreadsheet and XML Global Substitution Primer Tutorial is shown below.
Yesterday when we were discussing Worldbank API World Country Reporting Regex Tutorial we mentioned โฆ
โฆ and we use
it(ie. Regex) with serverside PHP today, under the auspices of the preg_match function, though we most often use RegEx thinking with the Javascript replace function, as the way to make substitutions for more than one occurrence, (the one occurrence design being) a default โcuriosityโ (but can be useful too) about Javascriptโs version of substitution. You may know this RegEx usage of the Javascript replace function as โglobal substitutionโ.
โฆ and that term โglobal substitutionโ. Many editing jobs, especially text file based ones, require or benefit from โglobal substitutionโ carefully applied, that is. It is common to see an editor who shies away from โglobal substitutionโ methods, and in many cases that is wise, but โglobal substitutionโ gets good results when you โฆ
- substitute things you know exist in the precise form you intend to search for, and only there, where you want to replace โฆ to
- replacements should not feed back into the substitution list โฆ doh โฆ or you will end up with a confused unintended result
In real life, it is often the case that the conditions above are easy to obey, because you are mapping an old numbering and/or naming system to a completely new and dissimilar numbering and/or naming system. Thatโs the case in a little job we drilled down into, to show you what we did, that involved RegEx thoughts, to solve a problem.
So, with our job we had โฆ
- Aim: Change some XML in one file to have the text in one column of a Spreadsheet be mapped to the contents of another column of that same Spreadsheet
- Inputs: Excel Spreadsheet with those two columns as mentioned above and the one input XML file
- From the User: Asked for the user to send the Excel Spreadsheet โฆ Saved As Comma Separated Values (CSV) in MS-DOS format and the one input XML file as two attached files in an Email
- Processing:
- Opened Email with Gmail web application in Safari web browser desktop application, on a MacBook Pro laptop
- Downloaded the two Attachments and copied over to where we like to work โฆ the home of MAMP local Apache/PHP/MySql web server โฆ on a Mac OS X system is /Applications/MAMP/htdocs (which weโll access later with the Mac OS X Terminal desktop application later via โcd /Applications/MAMP/htdocsโ)
- Opened our favourite Text Editor desktop application, called TextWrangler, whose โFind and Replaceโ โGrepโ suboption will be a feature of todayโs solution
- File -> Open the Spreadsheet CSV file
- Search -> Findโฆ โฆ Matching Mode: Grep โฆ Find: ^ Replace: # โฆ Replace All โฆ remember our โRegExโ โcheat sheetโ discussion (lots of which is relevant to TextWrangler Matching Mode: Grep as well) at that aforesaid mentioned tutorial โฆ
- ^ can mean โstart ofโ
- $ can mean โend ofโ
- . can sometimes mean โone existant character wildcardโ โฆ or sometimes it is % or ? for this in other โsystemsโ
- * can often mean โzero or more of preceding character wildcardโ
- [] and () bracketing rules are pretty crucial for the more esoteric usages โฆ also study | usage
? โฆ well, we want to start out mapping all lines to non-acting Korn Shell command lines
- Typed as the new top line #!/bin/ksh โฆ just for completeness sake โฆ is optional step
- Search -> Findโฆ โฆ No Matching Mode โฆ Find: #,[ Replace: cat COMM.MIL~INDEX.xml | sed โ/\[ โฆ Replace All
- Search -> Findโฆ โฆ Matching Mode: Grep โฆ Find: ]$ Replace: \\]/gโ > x.xxx ; cat x.xxx > COMM.MIL~INDEX.xml ; rm -f x.xxx โฆ Replace All
- File -> Save Asโฆ fix_csv.ksh (to /Applications/MAMP/htdocs directory)
- Opened Terminal desktop application that has a default Bash environment (a lot like Linux, but is (giving you access to) a Mac OS X BSD operating system, really)
- Typed in: cd /Applications/MAMP/htdocs # to get to data
- Typed in: cp COMM.MIL~INDEX.xml COMM.MIL~INDEX_original.xml # to backup data ahead of processing, as well as to compare file sizes with later, as a sanity check
- Typed in: ksh -x fix_csv.ksh # access Korn Shell interpreter and run the TextWrangler created Korn Shell Script (and the -x switch tells the interpreter to be verbose with output reporting)
- Typed in: ls -l COMM.MIL~INDEX*.xml # first sanity check verified files different, and not disastrously so โฆ good first sign
- Typed in: fgrep -c โ[S1.12.4.20]โ COMM.MIL~INDEX*.xml ; fgrep -c โ[CCR.28E.20]โ COMM.MIL~INDEX*.xml # second sanity check to prove old/new parts of first/last relevant Spreadsheet CSV file records were correctly mapped โฆ and they were โฆ so
- Opened Email with Gmail web application in Safari web browser desktop application (and used โForwardโ option, attaching that new XML file), on a MacBook Pro laptop โฆ so that โฆ
- Output: One XML file with the global substitutions expressed in the Excel Spreadsheet performed, returned to User via Email โForwardโ option, attaching that new XML file
We hope you can see the good use you can make with Email and a good Text Editor and Linux type shell scripting, influenced by RegEx pattern matching regarding โฆ
Which leaves us with todayโs PDF slideshow of snapshots of making this job work, here.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.
One Response to Spreadsheet and XML Global Substitution Accountability Tutorial