If you read MySql Down Thinking Followup Tutorial below youโll get the gist of how our rjmprogramming.com.au web serverโs Linux Watchdog works trying to โฆ
- spot troubles with MySql database(s) โฆ and if found โฆ
- effectively restart MySql via โฆ
service mysql restart
This strategy works for the most part, but โฆ
- if the web server is flooded with MySql database requests โฆ or โฆ
- develops a weakness in a crucial table (that causes a bank up of queries for MySql to try to handle, or some disk space issue)
โฆ can spell T.r.o.u.b.l.e (with a capital โTโ) as happened this morning AEST (Australian Eastern Standard Time on 25th August 2020).
You see, itโs all well and good me telling you to do โthis that and the otherโ but if you run an Apache web server with limited resources (and so a limited MySql connection pool), it is quite possible that on occasions it will become โฆ
MySql down
Apache challenged
โฆ so that getting to a webpage on the (MySql database) WordPress blog website you are reading now will either โฆ
- hang, and timeout โฆ or โฆ
- come up with the error message โฆ
Error establishing a database connection
โฆ an error message particular to WordPress (and used by our inhouse watchdog)
โฆ but what if a problem develops in another of our MySql products, like ZenCart? Yes, it panned out symptoms a lot like those of โฆ
- Troubleshooting CentOS Web Server Disk Zencart Issue Tutorial (but not with table โsessionsโ but table โwhos_onlineโ today) โฆ how did we know? โฆ get into (the wonderful) phpMyAdmin and the table โwhos_onlineโ will show the words โin useโ over the spread of the rightmost five columns of the ZenCart database table report โฆ requiring (phpMyAdmin) โฆ
repair table whos_online
โฆ in phpMyAdmin ZenCart database โฆ as well as there being a โฆ - WordPress database table issue of some sort with table wps_options โฆ requiring (phpMyAdmin) โฆ
repair table wps_options
โฆ in phpMyAdmin WordPress database
โฆ this multiple MySql database set of problems perhaps confusing our Linux Watchdog.
Solving this took hours, first working it to get access (weโll outline below for ssh, sftp, Control Panel, Power Management), but we probably could have saved two hours by realizing that half way along we had a scenario whereby WordPress errored out as above, ZenCart errored out in a similar way to Troubleshooting CentOS Web Server Disk Zencart Issue Tutorial but Joomla (also MySql) did not error out. It didnโt tweak with us then and there that this meant our Linux Watchdog โrestart of all MySqlโ approach was not the best approach, and what would have saved time would have been to go straight to phpMyAdmin looking for โin useโ MySql database tables to โrepairโ, instead โฆ
- at first viewing โฆ
- MySql websites hung or gave errors and non-MySql websites hung or were incredibly slow
- ssh did not work โOperation timed outโ
- sftp did not work โOperation timed outโ
- Control Panel hung
โฆ leaving just Power Management (we ended up using twice, and incredibly slowly) to Stop and Start the Virtual Server (that is the rjmprogramming.com.au Apache/PHP/MySql web server)
- and at that second (Power Management โStartโ) go kept on trying โsshโ until โOperation timed outโ turned to โConnection refusedโ and onto eventually connecting (all the while Control Panel was still hanging) โฆ where we (on Linux command line) โฆ
service cpanel restart
service httpd restart
service mysql restart
โฆ the last one repeated until โฆ - Control Panel came good and around about this time that WordPress (no), ZenCart (no), Joomla (yes) finding did not tweak with us โฆ so โฆ
- fruitless generic MySql restartings were ineffective โฆ until โฆ
- using Control Panel to get into phpMyAdmin we discovered the โin useโ tables above, and โrepair tableโ of these followed by โฆ
service httpd restart
service mysql restart
โฆ got things going better again, eventually
We hope this snapshot into some of this troubleshooting might be of use to some readers.
Previous relevant MySql Down Thinking Followup Tutorial is shown below.
When we discussed Linux Watchdog Primer Tutorial in that very generic fashion below, we were being that โฆ generic. Today we turn to specifics in that regard. On the rjmprogramming.com.au domain we use a โฆ
- (software) watchdog which checks for the health of the MySql Service, and if not healthy, restore it to health โฆ which is all fine and good from the point of view of the domain rjmprogramming.com.au โฆ but depending on what the user was doing we could โฆ
- code for intervention within the MySql using piece of software, and writing out the database error to the webpage, and present alternative navigation
โฆ on the understanding with that latter thought, that we put into play today for a WordPress 4.1.1 blog, we realized we needed to intervene in โฆ
[DocumentRoot]/ITblog/wp-includes/functions.php
โฆ in the emboldened code intervention as below โฆ
// Otherwise, be terse.
status_header( 500 );
nocache_headers();
$ubitsare=explode("/", str_replace("/?p=","",$_SERVER['REQUEST_URI']));
if (sizeof($ubitsare) > 2) {
if (strpos(str_replace("-","%20",$ubitsare[2]), "slideshow.htm") === false) {
header("Location: /slideshow.html?title=" . explode("&", str_replace("-","%20",str_replace("/?p=","",$ubitsare[2]))[0]));
exit;
} else {
header( 'Content-Type: text/html; charset=utf-8' );
}
} else {
header( 'Content-Type: text/html; charset=utf-8' );
}
?>
<!DOCTYPE html>
<html xmlns="//www.w3.org/1999/xhtml"<?php if ( is_rtl() ) echo ' dir="rtl"'; ?>>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title><?php _e( 'Database Error' ); ?></title>
</head>
<body>
<h1><?php _e( 'Error establishing a database connection'); ?></h1>
</body>
</html>
<?php
die();
}
โฆ where the redirection to /slideshow.html effectively โCuts to the Chaseโ for the gist of that blog posting you never got to, as MySql is currently down.
Notice how specific the actions can be when you write code โdependentโ on another software component, as distinct from the โwatchdogโ approach we often want to be โindependentโ in its thinking.
Guess this begs the question? How did we work out where to intervene? We just got to folders on the rjmprogramming.com.au web server with the WordPress (Codex) software and went (showing you the one that yielded the nicest result for us) โฆ
pwd // and we are at [DocumentRoot]/ITblog
cd wp-includes
fgrep 'Error establishing a database connection' *.php
Previous relevant Linux Watchdog Primer Tutorial is shown below.
โ
Do you have a computer operational problem? What the โฆ what is that humanoid on about? Well, is there a computer operation you have to do routinely, to fix an ongoing problem, that relies on your personal intervention that, one day, youโd like to not have to worry about, or better still โฆ you usually intervene, but would like a backup approach should you get sick and canโt do it โฆ hereโs where you need Fido a watchdog
โฆ am not recommending a Golden Retriever, unless you want your process licked rather than attended to? (had you forgotten this was a question,
CedricLinuxNala)? โฆ
oh, moi?)
โ
Why is bellybutton fluff blue? But we digress, or at Buckingham Palace one digresses.
Right, back at watchdogs, there are schools of thought โฆ
- The process is down for too long and we need to do something about it, because customers are leaving
- If you need a watchdog to save your bacon, then (clearly youโre not with it, because the dogโs eaten the bacon and) there is something else fundamentally wrong or something you do not understand, which is what you really should resolve, either way
My view is that, if the underlying process would take years to understand or if it is written in legacy code, Iโd go more with the former idea, especially as there is delight in creating a really good watchdog (training one? not so easy) โฆ it can be really hard to do โฆ for this reason cannot give code here really, because there is no โout of the boxโ that is a responsible approach to advise โฆ you have to study the issue and cut it into its components, unit test the solutions to the components of your watchdog solution, and retest with the interactions of those components. However, this is a coding enthusiastโs view, and is a bit short-sighted, perhaps. In any case, what will save the day is that this decision will probably be made for you by an operations expert, if you work in a large organization.
Some other watchdog considerations should be โฆ
- Is the attempt to automate the solution that the watchdog will provide technically possible โฆ may not be?!
- We can resolve it with personal intervention โฆ can the watchdog simulate each step of the human intervention? If so, go for that approach if possible.
- Be very careful of approaches that involve mouse clicks, as they are quite often relative to too many other environmental issues โฆ try to restrict the watchdog solution to command line/scripting/keyboard ideas โฆ on Windows, AutoHotKey is an excellent recording program of interest (would recommend just using it for keyboard recordings, if using it for a watchdog โฆ by the way, tomorrow (tomorrow arrived today) we do a tutorial showing you how to create an AutoHotKey terminate-and-stay-resident program on Windows).
- Have we identified the real intervention points? If not, you might succeed some of the time, but not all the time, and you may cause damage on those times when you have made some assumptions, with your incomplete understanding.
Here is an example. You have an overnight batch process run, and it falls over at a certain point, and you get paged at 3 or 4 (itโs bound to be AM). It has been tentatively decided you might want to create a watchdog โฆ what are some considerations โฆ
- What do log files tell me? Find out.
- Is it a single thing that is missing that would resolve the problem once and for all? If yes, well, you know not to deviate from this one thing โฆ ignore ideas below.
- Of the few problems, is it worth proceeding with the watchdog idea, because the number of separate issues can often cause a factor of ten more complication issue points, and maybe you should stick with human intervention.
- Break the watchdog problem into these problem issues as a separate unit-testable piece of scripting code (or whatever your watchdog solution entails) โฆ test each for success โฆ retest for their interaction with each other (ie. that they donโt interfere with each other).
The title of this tutorial mentions Linux but generic thinking like above covers other operating system thought patterns, but there are some Linux (or Unix) tools that are great Linux commands that we should point out โฆ thanks to Hscripts.com and CyberCiti.biz for this โฆ
- crontab ( eg. */5 * * * * Username /path/to/command # where /path/to/command gets run by Username every five minutes โฆ arranged via crobtab -e ) โฆ for Windows, equivalent would be Task Schedular in Windows Primer Tutorial
- nohup
- bg
- nice
And here are some of the practicalities of a watchdog โฆ
- Where does it run? If at more than one place, consider each place separately. If it runs on more than one computer, then clearly this is important. Does the directory and file permissions allow the watchdog to run, but do not allow other users to misuse it? (Please say yes here.) A generic thing about crontab or nohup (or Windows start) scripting arrangements are that you should not assume the environments of these processes is the same as your current running command line process โฆ you should write as if you have just logged on and have done nothing โฆ so just about the first decision of the script is to โcdโ itself to the proper place where it was designed to run.
- Which user(s) (on whichever computer(s)) can run it?
- When does it run? If the solution is uncomplicated enough, maybe you can use a pre-emptive approach. For example, you know the problem is to do with a file missing when a non-critical process fails but later on that file is looked for, then why not pre-emptively get the watchdog to create that file (with default data) ahead of the process run crash point time.
- As part of the question above, does one subprocess need to end before another starts? If yes, you need to intervene in such a way that that process architecture remains, and you need to work out an independent way for your independent watchdog to step in, at the correct time, and take over the same task, as required. But if it gets to this, donโt you understand the underlying process well enough to have a crack at doing the โrealโ solution (for all time)? Have a think, now, and keep checking in on the issue?
- You need to log the workings of the watchdog both for information and for further research which might help in achieving a โrealโ solution (without the watchdog) further down the track.
So why was this posting called a tutorial? Well, thereโs some homework. You see there are these Daleks, and we sort of need to know when theyโre going to invade Earth again, and Dr Who is not always available, so, was wondering, if it wouldnโt be too inconvenient โฆ if you wouldnโt mind writing that watchdog to detect a Dalek invasion and shoo them off โฆ 6 hours โฆ okay?
โ
Woofsky!
โ
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.