Spreadsheet and XML Global Substitution Primer Tutorial

Spreadsheet and XML Global Substitution Primer Tutorial

Spreadsheet and XML Global Substitution Primer Tutorial

Yesterday when we were discussing Worldbank API World Country Reporting Regex Tutorial we mentioned …

… and we use it (ie. Regex) with serverside PHP today, under the auspices of the preg_match function, though we most often use RegEx thinking with the Javascript replace function, as the way to make substitutions for more than one occurrence, (the one occurrence design being) a default “curiosity” (but can be useful too) about Javascript’s version of substitution. You may know this RegEx usage of the Javascript replace function as “global substitution”.

… and that term “global substitution”. Many editing jobs, especially text file based ones, require or benefit from “global substitution” carefully applied, that is. It is common to see an editor who shies away from “global substitution” methods, and in many cases that is wise, but “global substitution” gets good results when you …

  • substitute things you know exist in the precise form you intend to search for, and only there, where you want to replace … to
  • replacements should not feed back into the substitution list … doh … or you will end up with a confused unintended result

In real life, it is often the case that the conditions above are easy to obey, because you are mapping an old numbering and/or naming system to a completely new and dissimilar numbering and/or naming system. That’s the case in a little job we drilled down into, to show you what we did, that involved RegEx thoughts, to solve a problem.

So, with our job we had …

  • Aim: Change some XML in one file to have the text in one column of a Spreadsheet be mapped to the contents of another column of that same Spreadsheet
  • Inputs: Excel Spreadsheet with those two columns as mentioned above and the one input XML file
  • From the User: Asked for the user to send the Excel Spreadsheet … Saved As Comma Separated Values (CSV) in MS-DOS format and the one input XML file as two attached files in an Email
  • Processing:
    1. Opened Email with Gmail web application in Safari web browser desktop application, on a MacBook Pro laptop
    2. Downloaded the two Attachments and copied over to where we like to work … the home of MAMP local Apache/PHP/MySql web server … on a Mac OS X system is /Applications/MAMP/htdocs (which we’ll access later with the Mac OS X Terminal desktop application later via “cd /Applications/MAMP/htdocs”)
    3. Opened our favourite Text Editor desktop application, called TextWrangler, whose “Find and Replace” “Grep” suboption will be a feature of today’s solution
    4. File -> Open the Spreadsheet CSV file
    5. Search -> Find… … Matching Mode: Grep … Find: ^ Replace: # … Replace All … remember our “RegEx” “cheat sheet” discussion (lots of which is relevant to TextWrangler Matching Mode: Grep as well) at that aforesaid mentioned tutorial …
      • ^ can mean “start of”
      • $ can mean “end of”
      • . can sometimes mean “one existant character wildcard” … or sometimes it is % or ? for this in other “systems”
      • * can often mean “zero or more of preceding character wildcard”
      • [] and () bracketing rules are pretty crucial for the more esoteric usages … also study | usage

      ? … well, we want to start out mapping all lines to non-acting Korn Shell command lines

    6. Typed as the new top line #!/bin/ksh … just for completeness sake … is optional step
    7. Search -> Find… … No Matching Mode … Find: #,[ Replace: cat COMM.MIL~INDEX.xml | sed ‘/\[ … Replace All
    8. Search -> Find… … Matching Mode: Grep … Find: ]$ Replace: \\]/g’ > x.xxx ; cat x.xxx > COMM.MIL~INDEX.xml ; rm -f x.xxx … Replace All
    9. File -> Save As… fix_csv.ksh (to /Applications/MAMP/htdocs directory)
  • Opened Terminal desktop application that has a default Bash environment (a lot like Linux, but is (giving you access to) a Mac OS X BSD operating system, really)
  • Typed in: cd /Applications/MAMP/htdocs # to get to data
  • Typed in: cp COMM.MIL~INDEX.xml COMM.MIL~INDEX_original.xml # to backup data ahead of processing, as well as to compare file sizes with later, as a sanity check
  • Typed in: ksh -x fix_csv.ksh # access Korn Shell interpreter and run the TextWrangler created Korn Shell Script (and the -x switch tells the interpreter to be verbose with output reporting)
  • Typed in: ls -l COMM.MIL~INDEX*.xml # first sanity check verified files different, and not disastrously so … good first sign
  • Typed in: fgrep -c ‘[S1.12.4.20]’ COMM.MIL~INDEX*.xml ; fgrep -c ‘[CCR.28E.20]’ COMM.MIL~INDEX*.xml # second sanity check to prove old/new parts of first/last relevant Spreadsheet CSV file records were correctly mapped … and they were … so
  • Opened Email with Gmail web application in Safari web browser desktop application (and used “Forward” option, attaching that new XML file), on a MacBook Pro laptop … so that …
  • Output: One XML file with the global substitutions expressed in the Excel Spreadsheet performed, returned to User via Email “Forward” option, attaching that new XML file

We hope you can see the good use you can make with Email and a good Text Editor and Linux type shell scripting, influenced by RegEx pattern matching regarding …

Which leaves us with today’s PDF slideshow of snapshots of making this job work, here.

If this was interesting you may be interested in this too.

This entry was posted in eLearning, Operating System, Tutorials and tagged , , , , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *