The Victim of Bad Device Drivers

bug.gif

I've been trying to deal with a few dodgey disk array for a few weeks. This was a consequence of the recent floods in Thailand and we were unable to get the high-capacity drives to make the 2TB array for the server, so they pressed an old email server's drive array into use, and it's been a bit dodgey to say the least.

To be fair, I'm glad we had the old array to press into service. If I had been forced to wait for the estimated 3 months, that would certainly have been worse. But I still have to say that bad device drivers are a pain, and I would really like them fixed.

So here's what's been happening… I come in in the morning and I see the mount point for this drive array is there in the filesystem, but all the jobs referencing it are failing. Wonderful. So I try to take a look at it:

  $ cd /plogs/Engine/dumps
  $ ls
  ls: cannot open directory .: Input/output error

No amount of un-mounting and re-mounting will work as the OS simply cannot see the drive array. We have to reboot the box and then it comes back online.

The problem with this approach is that I've got a ton of exchange feed recorders running on this box, and it's the only backup we have to production. If we miss recording one of these feeds, then it's gone as the exchanges aren't in the business of replaying their entire day just because we had a hardware problem.

So I'm trying to get a few things done - the first is get a real backup to the recorders in a second datacenter. The second is getting this drive array working properly on Ubuntu 10, hopefully with a kernel update that's in the offing. It is a decent array. I like it. But it's got to work first, and then I'll be happy.