Sometime, somewhere, when you least expect it, you're going to find yourself sitting on the wrong side of those MTBF statistics. Trust me.
As packet nerds, sometimes it's easy for us to forget that all of these bit and bytes actually flow over hardware. And hardware has this nasty physicality about it, unlike the ephemeral world of packets and protocols. If you think facing the thought of your own mortality is a scary thing, then you don't even want to consider the consequences of that shiny new 7,200 RPM drive in your mailserver deciding to become a 0 RPM drive. Once again: trust me.
We all know the machines within our organization that we deem "business critical." But increasingly, as we all find ourselves piling more and more functionality onto existing equipment in order to stretch our IT budgets, there are a class of machines that don't quite make it to the "business critical" category but still carry enough of a load that their loss will cause you incredible headaches. I would place these machines in the "PITA" category, because it's a Pain In The A** if they die.
Last night, one of my PITA machines died. Because stress levels in the IT industry are high enough, rather than raise anyone?s blood pressure with anticipatory angst, I'll clue you in-- while the trip itself wasn't pretty, there is a happy ending: the machine is up and running again. Along the way, though, I learned a few things that I thought I'd pass along. Consider these "Rules of Thumb For Hardware Nightmares":
1) Rack mount machines look nice, no doubt about it, but they were never meant to be worked on. If the fellow who invented them were anywhere near me last night, he would be 1U high right now. Enough said.
2) Cables, once removed, will become so horribly entangled with each other, themselves, cables leading to other equipment, your shoestrings, cobwebs, belt loops, and, generally, anything longer than it is wide, that they will never again reach back to where they were originally placed.
3) Did I mention that I hate that rack-mount guy?
4) Backups are a good thing.
5) The final screw holding up a rackmount server is always possessed by demons. As is the first one you try putting in when putting the server back.
6) Shutting down a server to replace one component will cause several others to fail. One must assume that God has it in for IT people.
7) Backups are a very, very good thing.
8) You will always remember that you forgot to replace at least one cable AFTER you've tightened the last screw when placing the server back in the rack. Twice.
9) Perhaps God and the guy who invented rack mount servers are in cahoots.
10) Did I mention that backups are a good thing?
So Tom, what's the point? Well, aside from the fact that the Internet could have come to a flaming end sometime last night and I would have been too busy to notice... and thus I needed some "filler" for my diary... it's simply this: Check your backups. Make sure that you're backing up not just the "business critical" machines in your infrastructure, but also your PITA machines too. Make sure that your backups work. Make sure that you can RESTORE them when you need them.
Of all the problems and hassles and "issues" that I had breathing life back into my dead PITA server, the ONLY thing that went 100% smoothly was restoring from backup.
Now if you'll excuse me, I need to find my cable stretcher...
Handler on Duty : Tom Liston ( http://www.labreatechnologies.com )
Sep 1st 2004
1 decade ago