Many projects that involve data will surprise you when you get to the phase of running your planned code over real data. Real business data can throw up surprises like …
- the length of time to process the data or the amount of data to process
- the size of one element of that data is huge (causing the problem shown on today’s tutorial picture)
… and, respectively, we find useful when PHP “surfing the net” mode of use is used, it can be useful to …
- set up the top of the PHP a code line like ini_set(‘max_execution_time’, 6000); setting the maximum execution time greater than PHP’s default of 30 seconds
- found with a file of 7.7 mB that we needed to switch from a PHP “surfing the net” mode of use to an exec([PHP “command line” mode of use command]) scenario (that is a vastly less memory hungry mode of use) just for that file
… and for our PHP mil_mapping.php “XML Adderer and Subtractor” web application (last talked about at XML Subtraction and Addition Accountability Tutorial) run of 93 files, we found that we need change for the “processing” surprises above, in this way.
Another very important aspect to software processing (and deployment for that matter) is that you either …
- log the actions made in files (especially for batch processing) … and/or …
- see for yourself in realtime that the software proceeded and completed correctly
With our “XML Adderer and Subtractor” project we always envisaged a “surfing the net” processing run, as we like a “cockpit” style of working here, and fit in with the second scenario above. Hence the “just for that file” modification design. That large file broke the initial run in the middle of the list of files, but the logic changes above allowed for the whole run to proceed “surfing the net” with a switch over to “command line usage” behind the scenes for “files over 4000000 bytes” (was our business logic for this decision).
Of course, if your project sponsor can supply you with the data size and numbers of components straight away as you unit test, that can be a good thing too. Take the opportunity to really “load test” your software that way if you get the chance … the earlier the better.
Previous relevant XML Subtraction and Addition Accountability Tutorial is shown below.
We like to make our web applications “accountable” to those users using them. Often “sharing” functionality features when improving the “accountability” of a web application, at least for us.
We’re trying to improve on the “accountability” of yesterday’s XML Subtraction and Addition Modes of Use Tutorial by …
- adding optional email reporting of the command line modes of use to improve the web application’s reporting capabilities
- leaving a script file on the web server, that at least for a short time, reflects that last command line execution
- adding back in the HTML div element change elements even when using command line functionalities (as buttons) off the “surfing the net” mode of use
We hope you try to implement the PHP mil_mapping.php changed for the “accountability” aims above, in this way, to get this into context for some of your own XML data.
Previous relevant XML Subtraction and Addition Modes of Use Tutorial is shown below.
Today’s job is to extend the modes of functionality for the “XML Subtraction and Addition” PHP web application we’ve been developing off yesterday’s XML Subtraction and Addition Genericization Tutorial.
You may recall from a few tutorials at this blog that we are sometimes keen, when it seems apt, to develop PHP (only) web applications to use the three modes of use, identified at PHP Find in Context Primer Tutorial, as …
- “surfing the net” calling the PHP from a web browser’s address bar
- “curl” call of the PHP
- “command line” execution of PHP
You can see this in play with today’s tutorial picture when two additional HTML input type=button elements to have three buttons for each of the PHP “modes of use” above, those latter two linked into the “surfing the net” PHP web application mode of use via PHP’s exec method, leaving us with PHP mil_mapping.php changed for the aims above, with lots of delimitation (inhouse design and) issues, in this way.
It’s not just an intellectual exercise for us. It’s a reminder what another dynamic duo are …
- PHP
- MAMP local Apache/PHP/MySql web server
… and how PHP’s exec method can help link modern “online” “surfing the net” methodologies with the history of those operating systems which both …
- preceeded in history those web browsers … and which, it should be noted …
- still underpins these “online” “surfing the net” methodologies in the web application “client” sense, and the web application” “server” sense … PHP brings this together as well
We recognize four other sets of dynamic duos involving PHP … think …
- PHP
- MySql database … we think Fred and Ginger
… and …
- PHP
- HTML so that the “server” can directly amend (or even create from scratch … as for our web application in this blog posting “thread”) the “client” webpage contents (on its way through)
… and …
- PHP
- Javascript so that the “server” can directly amend the “client” dynamics and interaction (on its way through)
… and …
- PHP
- Apache web servers as a great web server environment team
Previous relevant XML Subtraction and Addition Genericization Tutorial is shown below.
If you are a programmer not interested in “genericization”, and that’s because you can think on your feet without the “label” … “I tips my hats to ya” … well done. Think, though, that most of us, myself included, don’t always think that purely, alas. Sometimes, quality levels here can depend on how your day is going.
Today’s “genericization” work, following up on yesterday’s XML Subtraction and Addition Primer Tutorial had the simplest of underlying aims … doh! … eliminate in that code snippet of yesterday, have no hardcodings, perhaps with the exception of “<” and/or “>” … the kiss approach. There are two things we are doing today …
- genericizing, via an HTML form (via GET arguments web surfing, only, for now), but in so doing we left behind the command line mode usage, which we might talk about later … and …
- presenting these XML “additions” and “subtractions” usefully, “in application”
Both above were challenging, and with the second we were glad we could call on …
- HTML Textarea and Div Talents Primer Tutorial proved, at least to us, that HTML div elements had to be involved in some way shape or form … along with the web application user experience (UX) thoughts of …
- HTML5 Details Summary Primer Tutorial taught us, recently, about a great new HTML5 reveal CSS styling idea
… to crystallize ideas, and to end up with PHP mil_mapping.php changed for the aims above in PHP this way. You can set up your own work with it via a download, then via (perhaps via the MAMP Apache/PHP/MySql) local web server mode usage.
Previous relevant XML Subtraction and Addition Primer Tutorial is shown below.
XML being the intelligent data protocol it is, our last XML “command line or local web server (ours being the MAMP Apache/PHP/MySql web server we run our code through) web browsing” mode application we wrote when we presented Spreadsheet and XML Global Substitution Genericization Tutorial, you’d think with the “genericization” efforts we went to during that job that we’d use it for everything. Information Technology “life” doesn’t work that way though, and the overarching data change aims we’d list as …
- XML global substitution
- XML data driven substraction of data
- XML data driven addition of data
… with the previous blog posting above tailored for genericization of the first of above, but a stretch too far to imagine the usage for a scenario involving those other two “subtraction” and “addition” aims listed above. It gets ridiculous to not do a bit of division of application aims when thinking about these things, otherwise the complexity of your application may lead to a one-off usage, not the happiest of scenarios.
In our first draft today we show the bits before our genericization “push” with our job, and here is where, a code comment can play a big part, and be of use, a code feature, we normally don’t give a huge amount of credence to, instead figuring the use of web inspectors these days are a good analyzing tool rather than relying on code source comments.
So if we show you the code below, developed with the user and public tester regarding requirements, via email …
// </title><section role="annot_cont”
$startfind='role="annot_cont"';
foreach (glob($filespec) as $filename) {
$precont=@file_get_contents($filename);
$cont=$precont;
$sections=explode($startfind, $cont);
for ($i=1; $i<sizeof($sections); $i++) {
// check that "<section " preceded
if (explode(" ",explode("<", $sections[-1 + $i])[-1 + sizeof(explode("<", $sections[-1 + $i]))])[0] == "section") {
// before that last end tag should be </title>
if (explode(">",explode("<", str_replace("<?"," ",str_replace("<section "," ",$sections[-1 + $i])))[-1 + sizeof(explode("<", str_replace("<?"," ",str_replace("<section "," ",$sections[-1 + $i]))))])[0] == "/title") {
// after that will be "<leg-history "
if (strpos($sections[$i], "<leg-history ") !== false) {
// and in between will be no already done <title></title> or other type of tag actually
if (strpos("<title", explode("<leg-history ", $section[$i])[0]) === false && strpos("<", explode("<leg-history ", $section[$i])[0]) === false && strpos("</title>", explode("<leg-history ", $section[$i])[0]) === false) {
$bits=explode("<leg-history ", $sections[$i]);
if (strpos(explode(">", $sections[$i])[0], ' href="') !== false) {
$hrefv=explode('"',explode('href="', explode(">", $sections[$i])[0])[1])[0];
$newsectionone=str_replace($bits[0], str_replace(' href="' . $hrefv . '"', '', $bits[0] . '<title arch="online">Note</title>'), $sections[$i]);
$cont=str_replace($startfind . $sections[$i], $startfind . $newsectionone, $cont);
$sections=explode($startfind, $cont);
} else {
$newsectionone=str_replace_first($bits[0], $bits[0] . '<title arch="online">Note</title>', $sections[$i]);
$cont=str_replace($startfind . $sections[$i], $startfind . $newsectionone, $cont);
$sections=explode($startfind, $cont);
}
}
} else if (strpos($sections[$i], "<notes ") !== false) {
// and in between will be no already done <title></title> or other type of tag actually
if (strpos("<title", explode("<notes ", $section[$i])[0]) === false && strpos("<", explode("<notes ", $section[$i])[0]) === false && strpos("</title>", explode("<notes ", $section[$i])[0]) === false) {
$bits=explode("<notes ", $sections[$i]);
if (strpos(explode(">", $sections[$i])[0], ' href="') !== false) {
$hrefv=explode('"',explode('href="', explode(">", $sections[$i])[0])[1])[0];
$newsectionone=str_replace($bits[0], str_replace(' href="' . $hrefv . '"', '', $bits[0] . '<title arch="online">Note</title>'), $sections[$i]);
$cont=str_replace($startfind . $sections[$i], $startfind . $newsectionone, $cont);
$sections=explode($startfind, $cont);
} else {
$newsectionone=str_replace_first($bits[0], $bits[0] . '<title arch="online">Note</title>', $sections[$i]);
$cont=str_replace($startfind . $sections[$i], $startfind . $newsectionone, $cont);
$sections=explode($startfind, $cont);
}
}
}
}
}
}
if ($cont != $precont) {
if (!file_exists($filename . "_original_backup")) copy($filename, $filename . "_original_backup");
file_put_contents($filename, $cont);
}
… we came up with a first draft PHP mil_mapping.php that “does the job”, the first thing to hone in on with any job … doh! Follow up tutorials make things clearer in a couple of ways to this initial draft, but please, rather than concentrating on bells and whistles, do not take your eye off the “application doing what the user wanted it to do” primary edict that should apply with any user based Information Technology work.
Previous relevant Spreadsheet and XML Global Substitution Genericization Tutorial is shown below.
It’s one thing to write a useful one off web application with quite a few hard codings, but what about an attempt to genericize it, and by so doing, oftentimes you are improving its documentation aspects, so that, if the code is revisited years later …
- its generic qualities will be plain to all … and at the same time …
- it will be far easier to imagine as far as inputs are concerned …
- the user can (still) break the job up
- the user has less to worry about as far as a backup of data goes
- it does not feel like a one off any more
- it is less likely to be ill used for an inapplicable application
… so that, all in all, we feel much more confident such code can last the test of time and usefulness into the future than yesterday’s (albeit useful) one off feeling version of the code you can see at Spreadsheet and XML Global Substitution CSV Tutorial as shown below.
What’s the main driver of genericization, in our book (but not our pamphlettes) for small jobs?
- turn all hardcodings you can into parameterizable variables, and today that is via PHP $_GET[] variables off the web browser address bar
- allow the user to change these, as the hardcodings just become defaults, and are presented in a submittable HTML form whose action is to recall the same piece of PHP software (code)
… simple, huh?!
Again, you can see the various aspects of this, in play, with today’s tutorial picture, and though it is not much use to run the PHP code live, its style is far more generic now, so we want to share tr_mapping.php (changed this way) with you for your perusal … just in case (it is of use for you).
Previous relevant Spreadsheet and XML Global Substitution CSV Tutorial is shown below.
Programmatically, we came in half way with the programming when we presented Spreadsheet and XML Global Substitution Primer Tutorial as shown below. The programming, then, had two inputs, namely …
- input spreadsheet’s CSV file manually created
- index XML file
… but that CSV file can be programmatically created rather than manually created. And while we’re at programmatically creating the CSV we could also programmatically create the Korn Shell (ksh) easier there too, with the same program, rather than using TextWrangler’s Grep (RegEx) talents … not that we’re ungrateful or anything … but it is good to mix things up to improve procedures sometimes.
And what programming language can we use, and what environment for that programming code? We think …
- coding wise, we’ll use PHP (starring PHP’s glob() method) … and the environment for that will be that …
- we’ll use a (local Apache/PHP/MySql web server) MAMP subfolder (ie. how desktop application “can meet” web application) off its Document Root (/Applications/MAMP/htdocs/) … /Applications/MAMP/htdocs/tr_mapping/ … to store the XML data files (no CSV needed as input this way, as it will be programmatically created in part 1 of 2 parts to the whole job) … which becomes accessible in two ways …
- http://localhost:8888/tr_mapping/tr_mapping.php#in_a_web_browser
- at Mac OS X Terminal desktop application command line via …
cd /Applications/MAMP/htdocs/tr_mapping
ksh -x tr_mapping.ksh
Again, email is the conduit for both sides of …
- input in
- output out
… to complete proceedings. You can see the various aspects of this, in play, with today’s tutorial picture, and though it is not much use to run the PHP code live, its style is leaning towards the generic side enough for us to want to share tr_mapping.php with you for your perusal … just in case (it is of use for you).
Previous relevant Spreadsheet and XML Global Substitution Primer Tutorial is shown below.
Yesterday when we were discussing Worldbank API World Country Reporting Regex Tutorial we mentioned …
… and we use
it(ie. Regex) with serverside PHP today, under the auspices of the preg_match function, though we most often use RegEx thinking with the Javascript replace function, as the way to make substitutions for more than one occurrence, (the one occurrence design being) a default “curiosity” (but can be useful too) about Javascript’s version of substitution. You may know this RegEx usage of the Javascript replace function as “global substitution”.
… and that term “global substitution”. Many editing jobs, especially text file based ones, require or benefit from “global substitution” carefully applied, that is. It is common to see an editor who shies away from “global substitution” methods, and in many cases that is wise, but “global substitution” gets good results when you …
- substitute things you know exist in the precise form you intend to search for, and only there, where you want to replace … to
- replacements should not feed back into the substitution list … doh … or you will end up with a confused unintended result
In real life, it is often the case that the conditions above are easy to obey, because you are mapping an old numbering and/or naming system to a completely new and dissimilar numbering and/or naming system. That’s the case in a little job we drilled down into, to show you what we did, that involved RegEx thoughts, to solve a problem.
So, with our job we had …
- Aim: Change some XML in one file to have the text in one column of a Spreadsheet be mapped to the contents of another column of that same Spreadsheet
- Inputs: Excel Spreadsheet with those two columns as mentioned above and the one input XML file
- From the User: Asked for the user to send the Excel Spreadsheet … Saved As Comma Separated Values (CSV) in MS-DOS format and the one input XML file as two attached files in an Email
- Processing:
- Opened Email with Gmail web application in Safari web browser desktop application, on a MacBook Pro laptop
- Downloaded the two Attachments and copied over to where we like to work … the home of MAMP local Apache/PHP/MySql web server … on a Mac OS X system is /Applications/MAMP/htdocs (which we’ll access later with the Mac OS X Terminal desktop application later via “cd /Applications/MAMP/htdocs”)
- Opened our favourite Text Editor desktop application, called TextWrangler, whose “Find and Replace” “Grep” suboption will be a feature of today’s solution
- File -> Open the Spreadsheet CSV file
- Search -> Find… … Matching Mode: Grep … Find: ^ Replace: # … Replace All … remember our “RegEx” “cheat sheet” discussion (lots of which is relevant to TextWrangler Matching Mode: Grep as well) at that aforesaid mentioned tutorial …
- ^ can mean “start of”
- $ can mean “end of”
- . can sometimes mean “one existant character wildcard” … or sometimes it is % or ? for this in other “systems”
- * can often mean “zero or more of preceding character wildcard”
- [] and () bracketing rules are pretty crucial for the more esoteric usages … also study | usage
? … well, we want to start out mapping all lines to non-acting Korn Shell command lines
- Typed as the new top line #!/bin/ksh … just for completeness sake … is optional step
- Search -> Find… … No Matching Mode … Find: #,[ Replace: cat COMM.MIL~INDEX.xml | sed ‘/\[ … Replace All
- Search -> Find… … Matching Mode: Grep … Find: ]$ Replace: \\]/g’ > x.xxx ; cat x.xxx > COMM.MIL~INDEX.xml ; rm -f x.xxx … Replace All
- File -> Save As… fix_csv.ksh (to /Applications/MAMP/htdocs directory)
- Opened Terminal desktop application that has a default Bash environment (a lot like Linux, but is (giving you access to) a Mac OS X BSD operating system, really)
- Typed in: cd /Applications/MAMP/htdocs # to get to data
- Typed in: cp COMM.MIL~INDEX.xml COMM.MIL~INDEX_original.xml # to backup data ahead of processing, as well as to compare file sizes with later, as a sanity check
- Typed in: ksh -x fix_csv.ksh # access Korn Shell interpreter and run the TextWrangler created Korn Shell Script (and the -x switch tells the interpreter to be verbose with output reporting)
- Typed in: ls -l COMM.MIL~INDEX*.xml # first sanity check verified files different, and not disastrously so … good first sign
- Typed in: fgrep -c ‘[S1.12.4.20]’ COMM.MIL~INDEX*.xml ; fgrep -c ‘[CCR.28E.20]’ COMM.MIL~INDEX*.xml # second sanity check to prove old/new parts of first/last relevant Spreadsheet CSV file records were correctly mapped … and they were … so
- Opened Email with Gmail web application in Safari web browser desktop application (and used “Forward” option, attaching that new XML file), on a MacBook Pro laptop … so that …
- Output: One XML file with the global substitutions expressed in the Excel Spreadsheet performed, returned to User via Email “Forward” option, attaching that new XML file
We hope you can see the good use you can make with Email and a good Text Editor and Linux type shell scripting, influenced by RegEx pattern matching regarding …
Which leaves us with today’s PDF slideshow of snapshots of making this job work, here.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.