“
Do you have a computer operational problem? What the … what is that humanoid on about? Well, is there a computer operation you have to do routinely, to fix an ongoing problem, that relies on your personal intervention that, one day, you’d like to not have to worry about, or better still … you usually intervene, but would like a backup approach should you get sick and can’t do it … here’s where you need Fido a watchdog
… am not recommending a Golden Retriever, unless you want your process licked rather than attended to? (had you forgotten this was a question,
CedricLinuxNala)? …
oh, moi?)
“
Why is bellybutton fluff blue? But we digress, or at Buckingham Palace one digresses.
Right, back at watchdogs, there are schools of thought …
- The process is down for too long and we need to do something about it, because customers are leaving
- If you need a watchdog to save your bacon, then (clearly you’re not with it, because the dog’s eaten the bacon and) there is something else fundamentally wrong or something you do not understand, which is what you really should resolve, either way
My view is that, if the underlying process would take years to understand or if it is written in legacy code, I’d go more with the former idea, especially as there is delight in creating a really good watchdog (training one? not so easy) … it can be really hard to do … for this reason cannot give code here really, because there is no “out of the box” that is a responsible approach to advise … you have to study the issue and cut it into its components, unit test the solutions to the components of your watchdog solution, and retest with the interactions of those components. However, this is a coding enthusiast’s view, and is a bit short-sighted, perhaps. In any case, what will save the day is that this decision will probably be made for you by an operations expert, if you work in a large organization.
Some other watchdog considerations should be …
- Is the attempt to automate the solution that the watchdog will provide technically possible … may not be?!
- We can resolve it with personal intervention … can the watchdog simulate each step of the human intervention? If so, go for that approach if possible.
- Be very careful of approaches that involve mouse clicks, as they are quite often relative to too many other environmental issues … try to restrict the watchdog solution to command line/scripting/keyboard ideas … on Windows, AutoHotKey is an excellent recording program of interest (would recommend just using it for keyboard recordings, if using it for a watchdog … by the way, tomorrow (tomorrow arrived today) we do a tutorial showing you how to create an AutoHotKey terminate-and-stay-resident program on Windows).
- Have we identified the real intervention points? If not, you might succeed some of the time, but not all the time, and you may cause damage on those times when you have made some assumptions, with your incomplete understanding.
Here is an example. You have an overnight batch process run, and it falls over at a certain point, and you get paged at 3 or 4 (it’s bound to be AM). It has been tentatively decided you might want to create a watchdog … what are some considerations …
- What do log files tell me? Find out.
- Is it a single thing that is missing that would resolve the problem once and for all? If yes, well, you know not to deviate from this one thing … ignore ideas below.
- Of the few problems, is it worth proceeding with the watchdog idea, because the number of separate issues can often cause a factor of ten more complication issue points, and maybe you should stick with human intervention.
- Break the watchdog problem into these problem issues as a separate unit-testable piece of scripting code (or whatever your watchdog solution entails) … test each for success … retest for their interaction with each other (ie. that they don’t interfere with each other).
The title of this tutorial mentions Linux but generic thinking like above covers other operating system thought patterns, but there are some Linux (or Unix) tools that are great Linux commands that we should point out … thanks to Hscripts.com and CyberCiti.biz for this …
- crontab ( eg. */5 * * * * Username /path/to/command # where /path/to/command gets run by Username every five minutes … arranged via crobtab -e ) … for Windows, equivalent would be Task Schedular in Windows Primer Tutorial
- nohup
- bg
- nice
And here are some of the practicalities of a watchdog …
- Where does it run? If at more than one place, consider each place separately. If it runs on more than one computer, then clearly this is important. Does the directory and file permissions allow the watchdog to run, but do not allow other users to misuse it? (Please say yes here.) A generic thing about crontab or nohup (or Windows start) scripting arrangements are that you should not assume the environments of these processes is the same as your current running command line process … you should write as if you have just logged on and have done nothing … so just about the first decision of the script is to “cd” itself to the proper place where it was designed to run.
- Which user(s) (on whichever computer(s)) can run it?
- When does it run? If the solution is uncomplicated enough, maybe you can use a pre-emptive approach. For example, you know the problem is to do with a file missing when a non-critical process fails but later on that file is looked for, then why not pre-emptively get the watchdog to create that file (with default data) ahead of the process run crash point time.
- As part of the question above, does one subprocess need to end before another starts? If yes, you need to intervene in such a way that that process architecture remains, and you need to work out an independent way for your independent watchdog to step in, at the correct time, and take over the same task, as required. But if it gets to this, don’t you understand the underlying process well enough to have a crack at doing the “real” solution (for all time)? Have a think, now, and keep checking in on the issue?
- You need to log the workings of the watchdog both for information and for further research which might help in achieving a “real” solution (without the watchdog) further down the track.
So why was this posting called a tutorial? Well, there’s some homework. You see there are these Daleks, and we sort of need to know when they’re going to invade Earth again, and Dr Who is not always available, so, was wondering, if it wouldn’t be too inconvenient … if you wouldn’t mind writing that watchdog to detect a Dalek invasion and shoo them off … 6 hours … okay?
“
Woofsky!
“
If this was interesting you may be interested in this too.