If you read MySql Down Thinking Followup Tutorial below you’ll get the gist of how our rjmprogramming.com.au web server’s Linux Watchdog works trying to …
- spot troubles with MySql database(s) … and if found …
- effectively restart MySql via …
service mysql restart
This strategy works for the most part, but …
- if the web server is flooded with MySql database requests … or …
- develops a weakness in a crucial table (that causes a bank up of queries for MySql to try to handle, or some disk space issue)
… can spell T.r.o.u.b.l.e (with a capital “T”) as happened this morning AEST (Australian Eastern Standard Time on 25th August 2020).
You see, it’s all well and good me telling you to do “this that and the other” but if you run an Apache web server with limited resources (and so a limited MySql connection pool), it is quite possible that on occasions it will become …
MySql down
Apache challenged
… so that getting to a webpage on the (MySql database) WordPress blog website you are reading now will either …
- hang, and timeout … or …
- come up with the error message …
Error establishing a database connection
… an error message particular to WordPress (and used by our inhouse watchdog)
… but what if a problem develops in another of our MySql products, like ZenCart? Yes, it panned out symptoms a lot like those of …
- Troubleshooting CentOS Web Server Disk Zencart Issue Tutorial (but not with table “sessions” but table “whos_online” today) … how did we know? … get into (the wonderful) phpMyAdmin and the table “whos_online” will show the words “in use” over the spread of the rightmost five columns of the ZenCart database table report … requiring (phpMyAdmin) …
repair table whos_online
… in phpMyAdmin ZenCart database … as well as there being a … - WordPress database table issue of some sort with table wps_options … requiring (phpMyAdmin) …
repair table wps_options
… in phpMyAdmin WordPress database
… this multiple MySql database set of problems perhaps confusing our Linux Watchdog.
Solving this took hours, first working it to get access (we’ll outline below for ssh, sftp, Control Panel, Power Management), but we probably could have saved two hours by realizing that half way along we had a scenario whereby WordPress errored out as above, ZenCart errored out in a similar way to Troubleshooting CentOS Web Server Disk Zencart Issue Tutorial but Joomla (also MySql) did not error out. It didn’t tweak with us then and there that this meant our Linux Watchdog “restart of all MySql” approach was not the best approach, and what would have saved time would have been to go straight to phpMyAdmin looking for “in use” MySql database tables to “repair”, instead …
- at first viewing …
- MySql websites hung or gave errors and non-MySql websites hung or were incredibly slow
- ssh did not work “Operation timed out”
- sftp did not work “Operation timed out”
- Control Panel hung
… leaving just Power Management (we ended up using twice, and incredibly slowly) to Stop and Start the Virtual Server (that is the rjmprogramming.com.au Apache/PHP/MySql web server)
- and at that second (Power Management “Start”) go kept on trying “ssh” until “Operation timed out” turned to “Connection refused” and onto eventually connecting (all the while Control Panel was still hanging) … where we (on Linux command line) …
service cpanel restart
service httpd restart
service mysql restart
… the last one repeated until … - Control Panel came good and around about this time that WordPress (no), ZenCart (no), Joomla (yes) finding did not tweak with us … so …
- fruitless generic MySql restartings were ineffective … until …
- using Control Panel to get into phpMyAdmin we discovered the “in use” tables above, and “repair table” of these followed by …
service httpd restart
service mysql restart
… got things going better again, eventually
We hope this snapshot into some of this troubleshooting might be of use to some readers.
Previous relevant MySql Down Thinking Followup Tutorial is shown below.
When we discussed Linux Watchdog Primer Tutorial in that very generic fashion below, we were being that … generic. Today we turn to specifics in that regard. On the rjmprogramming.com.au domain we use a …
- (software) watchdog which checks for the health of the MySql Service, and if not healthy, restore it to health … which is all fine and good from the point of view of the domain rjmprogramming.com.au … but depending on what the user was doing we could …
- code for intervention within the MySql using piece of software, and writing out the database error to the webpage, and present alternative navigation
… on the understanding with that latter thought, that we put into play today for a WordPress 4.1.1 blog, we realized we needed to intervene in …
[DocumentRoot]/ITblog/wp-includes/functions.php
… in the emboldened code intervention as below …
// Otherwise, be terse.
status_header( 500 );
nocache_headers();
$ubitsare=explode("/", str_replace("/?p=","",$_SERVER['REQUEST_URI']));
if (sizeof($ubitsare) > 2) {
if (strpos(str_replace("-","%20",$ubitsare[2]), "slideshow.htm") === false) {
header("Location: /slideshow.html?title=" . explode("&", str_replace("-","%20",str_replace("/?p=","",$ubitsare[2]))[0]));
exit;
} else {
header( 'Content-Type: text/html; charset=utf-8' );
}
} else {
header( 'Content-Type: text/html; charset=utf-8' );
}
?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"<?php if ( is_rtl() ) echo ' dir="rtl"'; ?>>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title><?php _e( 'Database Error' ); ?></title>
</head>
<body>
<h1><?php _e( 'Error establishing a database connection'); ?></h1>
</body>
</html>
<?php
die();
}
… where the redirection to /slideshow.html effectively “Cuts to the Chase” for the gist of that blog posting you never got to, as MySql is currently down.
Notice how specific the actions can be when you write code “dependent” on another software component, as distinct from the “watchdog” approach we often want to be “independent” in its thinking.
Guess this begs the question? How did we work out where to intervene? We just got to folders on the rjmprogramming.com.au web server with the WordPress (Codex) software and went (showing you the one that yielded the nicest result for us) …
pwd // and we are at [DocumentRoot]/ITblog
cd wp-includes
fgrep 'Error establishing a database connection' *.php
Previous relevant Linux Watchdog Primer Tutorial is shown below.
“
Do you have a computer operational problem? What the … what is that humanoid on about? Well, is there a computer operation you have to do routinely, to fix an ongoing problem, that relies on your personal intervention that, one day, you’d like to not have to worry about, or better still … you usually intervene, but would like a backup approach should you get sick and can’t do it … here’s where you need Fido a watchdog
… am not recommending a Golden Retriever, unless you want your process licked rather than attended to? (had you forgotten this was a question,
CedricLinuxNala)? …
oh, moi?)
“
Why is bellybutton fluff blue? But we digress, or at Buckingham Palace one digresses.
Right, back at watchdogs, there are schools of thought …
- The process is down for too long and we need to do something about it, because customers are leaving
- If you need a watchdog to save your bacon, then (clearly you’re not with it, because the dog’s eaten the bacon and) there is something else fundamentally wrong or something you do not understand, which is what you really should resolve, either way
My view is that, if the underlying process would take years to understand or if it is written in legacy code, I’d go more with the former idea, especially as there is delight in creating a really good watchdog (training one? not so easy) … it can be really hard to do … for this reason cannot give code here really, because there is no “out of the box” that is a responsible approach to advise … you have to study the issue and cut it into its components, unit test the solutions to the components of your watchdog solution, and retest with the interactions of those components. However, this is a coding enthusiast’s view, and is a bit short-sighted, perhaps. In any case, what will save the day is that this decision will probably be made for you by an operations expert, if you work in a large organization.
Some other watchdog considerations should be …
- Is the attempt to automate the solution that the watchdog will provide technically possible … may not be?!
- We can resolve it with personal intervention … can the watchdog simulate each step of the human intervention? If so, go for that approach if possible.
- Be very careful of approaches that involve mouse clicks, as they are quite often relative to too many other environmental issues … try to restrict the watchdog solution to command line/scripting/keyboard ideas … on Windows, AutoHotKey is an excellent recording program of interest (would recommend just using it for keyboard recordings, if using it for a watchdog … by the way, tomorrow (tomorrow arrived today) we do a tutorial showing you how to create an AutoHotKey terminate-and-stay-resident program on Windows).
- Have we identified the real intervention points? If not, you might succeed some of the time, but not all the time, and you may cause damage on those times when you have made some assumptions, with your incomplete understanding.
Here is an example. You have an overnight batch process run, and it falls over at a certain point, and you get paged at 3 or 4 (it’s bound to be AM). It has been tentatively decided you might want to create a watchdog … what are some considerations …
- What do log files tell me? Find out.
- Is it a single thing that is missing that would resolve the problem once and for all? If yes, well, you know not to deviate from this one thing … ignore ideas below.
- Of the few problems, is it worth proceeding with the watchdog idea, because the number of separate issues can often cause a factor of ten more complication issue points, and maybe you should stick with human intervention.
- Break the watchdog problem into these problem issues as a separate unit-testable piece of scripting code (or whatever your watchdog solution entails) … test each for success … retest for their interaction with each other (ie. that they don’t interfere with each other).
The title of this tutorial mentions Linux but generic thinking like above covers other operating system thought patterns, but there are some Linux (or Unix) tools that are great Linux commands that we should point out … thanks to Hscripts.com and CyberCiti.biz for this …
- crontab ( eg. */5 * * * * Username /path/to/command # where /path/to/command gets run by Username every five minutes … arranged via crobtab -e ) … for Windows, equivalent would be Task Schedular in Windows Primer Tutorial
- nohup
- bg
- nice
And here are some of the practicalities of a watchdog …
- Where does it run? If at more than one place, consider each place separately. If it runs on more than one computer, then clearly this is important. Does the directory and file permissions allow the watchdog to run, but do not allow other users to misuse it? (Please say yes here.) A generic thing about crontab or nohup (or Windows start) scripting arrangements are that you should not assume the environments of these processes is the same as your current running command line process … you should write as if you have just logged on and have done nothing … so just about the first decision of the script is to “cd” itself to the proper place where it was designed to run.
- Which user(s) (on whichever computer(s)) can run it?
- When does it run? If the solution is uncomplicated enough, maybe you can use a pre-emptive approach. For example, you know the problem is to do with a file missing when a non-critical process fails but later on that file is looked for, then why not pre-emptively get the watchdog to create that file (with default data) ahead of the process run crash point time.
- As part of the question above, does one subprocess need to end before another starts? If yes, you need to intervene in such a way that that process architecture remains, and you need to work out an independent way for your independent watchdog to step in, at the correct time, and take over the same task, as required. But if it gets to this, don’t you understand the underlying process well enough to have a crack at doing the “real” solution (for all time)? Have a think, now, and keep checking in on the issue?
- You need to log the workings of the watchdog both for information and for further research which might help in achieving a “real” solution (without the watchdog) further down the track.
So why was this posting called a tutorial? Well, there’s some homework. You see there are these Daleks, and we sort of need to know when they’re going to invade Earth again, and Dr Who is not always available, so, was wondering, if it wouldn’t be too inconvenient … if you wouldn’t mind writing that watchdog to detect a Dalek invasion and shoo them off … 6 hours … okay?
“
Woofsky!
“
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.
If this was interesting you may be interested in this too.