More Drive Array Problems

bug.gif

Today my UDP feed recorders blew out a 3TB+ disk array, and when the admins unmounted and remounted it, I had 2.4TB free. Something was going on. I had changed the file writing from a buffer and write once style to a append incremental updates scheme, and all of a sudden things blew up. So of course, I think it's me. So I decided to check.

The first thing was to get a simple test app, and then run it on the drive array and not, and compare the results. Thankfully, there were drives on my troubled box that weren't the drive array - my home directory, for one. So I just needed a test app, run it in my home directory, then on the drive array and compare the results.

My little test app was simple:

  #include <iostream>
  #include <fstream>
  #include <stdio.h>
  #include <stdint.h>
 
  int main() {
    std::string  name("local.bin");
    std::string  buffer("Now is the time for all good men to "
                        "come to the aid of their party\n");
 
    for (uint16_t i = 0; i < 10000; ++i) {
      std::stream  file(name.c_str(), (std::ios::out |
                                       std::ios::binary |
                                       std::ios::app));
      file << buffer;
      file.close();
    }
 
    return 0;
  }

and then I compiled it and ran it. on my home directory I got:

  $ ls -lsa
  656 -rw-r--r--  1 rbeaty UnixUsers 670000 Feb 22 16:44 local.bin

and when I ran it on my suspect drive array I got:

  $ ls -lsa
  262144 -rw-r--r--  1 rbeaty UnixUsers 670000 Feb 22 16:44 local.bin

So it's clear that the byte counts are right - 670000 in both cases, but the blocks used are reasonable on my home directory drive, but the drive array is totally wigged out. This explains the problem I've been seeing - when I append to a file the drive array gets confused and adds all kinds of blocks to the file, but doesn't corrupt the byte count. Very odd.

So I sent this to the admins and let them use this as a test case for trying to fix this. I'm sure hoping they can do something to fix this guy. I need to have this running as soon as possible.

UPDATE: that's 256k blocks - exactly. This is interesting. That means it's not accidental. There's something about the driver that's putting 256k blocks for the binary append, and doing this over and over again. Interesting, but it's just all the more evidence that this is a drive array bug.

[2/23] UPDATE: turns out to be an XFS option: allocsize=262144k, and that was easily fixed by the admins. I'm guessing the file system on the home directory filesystem wasn't XFS, or had a better default allocation size. But it's fixed. Good.