Archive for February, 2011

Amazing Difference A Character Can Make

Friday, February 18th, 2011

bug.gif

Today I had an amazing debugging session with some code I'd written and thought was in pretty good shape. It took most of the day, but there were really several things I needed to figure out in the test case I was given. Basically, it was an extreme edge case, and yet, it was possible, so it needed to be handled properly.

But the final bug was a doozie. The original code looked like this:

  const uint128_t  & ckey = aMessage->getConflationKey();

and in that simple line there was the final bug.

If we had multiple threads hitting the same code, it was possible to have these messages cleaned out. But what happens when you delete the aMessage and you have a reference to a value in it? Well... it goes to poo, that's what. The fix was simple:

  uint128_t  ckey = aMessage->getConflationKey();

One character - the ampersand. That's it. Make a copy as opposed to holding onto a reference. One character. Amazing.

Some days I love this job.

Google Chrome dev 11.0.672.2 is Out

Friday, February 18th, 2011

V8 Javascript Engine

This morning I noticed that Google Chrome dev 11.0.672.2 was out and with it quite a few changes. While the major version number doesn't mean a great deal, in my experience, the additions of the new V8 engine as well as a few nice Mac specific features makes this nice to have. I'm sure there's more to it than just this, but this is enough.

The Lockless MP/SC Ring FIFO Revisited

Thursday, February 17th, 2011

bug.gif

Today has been an interesting day... it started off with one of my servers having a lot of trouble with my new RingFIFO. Because it was still very early in the day (before the open), I just swapped out the RingFIFO with my LinkedFIFO and everything ran fine today. Still... when I had some time later today, I wanted to dig back into the issues with my RingFIFO and see if I couldn't fix it up and put it back into play for tomorrow.

Turns out, all I needed was a break and a little out of the box thinking. What I had originally used as the 'ring' was a simple template array of values:

  /**
   * We have a very simple structure - an array of values of a fixed
   * size and a simple head and tail.
   */
  volatile T         mData[eCapacity];
  volatile size_t    mHead;
  volatile size_t    mTail;
  volatile size_t    mInsPt;

but the problem with this is that I had the concern about how to atomically move the tail and place a value in the spot while checking for queue crashing. It was just too hard. In my old code, the push() referenced the head and the tail, and the same for the pop(). This was clearly a recipe for disaster.

So I simplified. Significantly. The results are pretty nice.

First thing I realized is that the location is really orthogonal to the validity of the data in the queue. This means that I should really have something that 'tags' the data elements as 'valid' or 'invalid', and then use that to know when I can pop, or if I've overrun the queue.

If the head is pointing to the next good value, then all I need to do is check it's 'valid' flag. If it's valid, then I can pop it. Move the head up, snatck the old value and invalidate it, and we're done.

It's all easily done if we change our data elements to something like this:

  /**
   * In order to simplify the ring buffer access, I'm going to actually
   * have the ring a series of 'nodes', and for each, there will be a
   * value and a valid 'flag'. If the flag isn't true, then the value
   * in the node isn't valid. We'll use this to decouple the push()
   * and pop() so that each only needs to know it's own location and
   * then interrogate the node for state.
   */
  struct Node {
    T      value;
    bool   valid;
 
    Node() : value(), valid(false) { }
    Node( const T & aValue ) : value(aValue), valid(true) { }
    ~Node() { }
  };
 
  /**
   * We have a very simple structure - an array of values of a fixed
   * size and a simple head and tail.
   */
  volatile Node      mData[eCapacity];
  volatile size_t    mHead;
  volatile size_t    mTail;

Now I can have a far more cleanly separated push() and pop(). This is going to make a significant difference, I hope. I can say this for sure - it's considerably faster in my tests - like about twice. Very nice.

Wicked Error with (Thankfully) Simple Solution

Thursday, February 17th, 2011

bug.gif

This morning I got an error that looked like this:

  *** glibc detected *** tpClient: double free or
     corruption (fasttop): 0x0...b0 ***

and after a bit of googling, it appears that this problem is in the C++ runtime checking on the calls to free(). It seems that the default behavior is to check the argument to free() to see if it's a double-free. Sounds nice, in theory, but if we're hammering the system, it seems like this might not be a good thing to do.

Thankfully, the solution is easy - don't check. To turn it off, simply say:

  export MALLOC_CHECK_=0

and the checks will not be done. Good enough for me. I'm still going to try out the TPMalloc from google, but that's a previous post. Still don't have it built for x86_64 yet.

Interesting Thread-Cacheing malloc() Replacement from Google

Thursday, February 17th, 2011

google-labs-logo.gif

This morning I was looking into the problem with my ticker plant clients and the older kernel shipping with CentOS 5. Basically, when I increased the message rate, we crossed over some threshold on CentOS and we started getting a lot of heap corruptions manifesting themselves as malloc() or free() problems in ZeroMQ. On Ubuntu 10.04.1 everything was fine, most likely because the kernel in Ubuntu 10.04.1 was significantly newer than the one in CentOS 5. So I went on a search for a better malloc() and free().

What I came across was the google-perftools. This is pretty amazing stuff. It's a thread-cache replacement for malloc() and free() that is as simple as adding a -ltcmalloc to the build line. It's got profiling tools as well, but that's not what interests me as much, it's the amazing speed gains that it provides. The graphs on the paper how about a 4x increase in operations per second when using this.

It's not conceptually hard - the TCMalloc library grabs a large block of memory from the system and then offers it up to the application. This puts the calls in user space, and the control of memory there as well. Because their design has the smaller blocks held in the thread, it's possible to see no locking contention on the malloc() and free() which should be a major boon to me.

I have to get it built by the Unix Admins for Ubuntu 10.04.1 - I've already built the x86_64 RPMs for CentOS 5 and installed them on a test box I have access to, but I really want to start on the Ubuntu boxes. Simple change, should see major improvement. Very exciting.

UPDATE: it's built on all my boxes and ready to go for tomorrow. I'm excited about the possibilities.

Google Chrome dev 10.0.648.82 is Out

Thursday, February 17th, 2011

This morning I saw that Google Chrome dev 10.0.648.82 was out with a stability release update. It's worded a little strangely, but then again, these guys re engineers not english majors, so you have to cut them a little slack now and again.

Nice to see the improvements. It's getting better and better.

Lockless Multiple-Producer/Single-Consumer Ring FIFO Queue

Wednesday, February 16th, 2011

Professor.jpg

I'm still on the quest for better performance out of my ticker plants and their clients, and to that end today I spent quite a bit of time on building a multiple-producer/single-consumer, lockless, FIFO ring buffer. The problem I have with my LinkedFIFO queue is that it's constantly doing malloc and free - OK, new and delete, but it's the same thing. The problem is all this thrashing. What I want is to be able to pre-allocate the memory for the buffer and then just use it.

So I spent a lot of time - more than I'd like to have spent, getting this guy right. It wasn't as easy as I'd have hoped, but in the end, I think it's going to be very useful for me.

The hard part was the multiple-producers. No question. The obvious solution to the multiple-producers is to get a concept of the 'new' Tail, put your data at the old Tail, and update the new Tail. You have to check for queue crashes in there, and all this has to be atomic. It's that last part that's the kicker.

What I come up with was the idea that the queue had an insert point. This was not the Tail, because the insert point would be atomically moved by the push() method and then, after the crashing was checked and we've set the value, we can update the tail.

    bool push( T & anElem )
    {
      bool    error = false;
 
      // move the insertion point and get the old value for me
      size_t  insPt = __sync_fetch_and_add(&mInsPt, 1);
      // see if we've crashed the queue all the way around...
      if (__sync_bool_compare_and_swap(&mHead, (insPt + 1), mHead)) {
        error = true;
        cLog.error("[push] the queue is full! Can't push this item");
      } else {
        // save the data in the right spot for the size
        const_cast<T &>(mData[insPt & eBitMask]) = anElem;
        // ...and finally update the 'tail' for a new item
        __sync_fetch_and_add(&mTail, 1);
      }
 
      return !error;
    }

If you move the tail before it's got a value, then it's possible that the consumer can see that the tail has moved, assume that it contains valid data, and process it. Bd move. So it's clear you have to update the tail last. But then how do you know where to put the new data? Hence the insert point.

This is working fine for me, and while it's still pretty new code, I have a feeling it's solid - there just isn't that much to it and the tests I've come up with are pretty inclusive. It's nice to have a choice of efficient FIFO queues to choose from.

Skitch 1.0.3 is Out

Wednesday, February 16th, 2011

Skitch.jpg

Today I saw that Skitch 1.0.3 was out with some nice features and bug fixes. I have to say, they have created a very nice product with almost no bug fix updates. Very slick and solid. I'm glad to support them with my dollars. Well worth it.

Tracking Down a Problem I Already Solved

Tuesday, February 15th, 2011

bug.gif

I hate being dense. Really. In the recent past, I realized that I can't really use the compare-and-swap in a multi-threaded environment like this:

  Node  *old = __sync_val_compare_and_swap(&mTail,
                                           mTail,
                                           newbie);

because it's possible that between the marshaling for the args, another thread would get in there and change the value of mTail so that this atomic operation would fail, but the returned value would not be what I expected it to be. It would be the previous value, but the new value would not be newbie, which is what I wanted.

What I was looking for was simply an exchange - I don't care about the current value, I just want to swap what's there now for something new, and give me the old while I give you the new.

No such luck. There's nothing like that in GCC.

So I'm back to the only thing I can do:

  Node   *t = mTail;
  while (!__sync_bool_compare_and_swap(&mTail, t, newbie)) {
    t = mTail;
  }

and in this we are forced to loop and get a consistent set, but in the end it's still just a swap. Sure do with they had one of those.

Anyway, I'm miffed at myself for spending several hours on this when I'd already solved it. The wrinkle came in the multi-threaded usage of the LinkedFIFO, and while my test cases worked, I didn't really check that there was no corruption in the data. I added those tests to the app and all of a sudden it was clear I had an issue. It took me about 5 mins to fix, and all was well, but finding the problem was hours.

Sometimes, I'm my own worst enemy. However, I will say that I'm getting far more comfortable with all this stuff the more I use it. Guess there's a silver lining in this after all.

Getting Bitmapped Fonts in Ubuntu 10.04.1

Monday, February 14th, 2011

Ubuntu Tux

I had a few development boxes restaged from CentOS 5 to Ubuntu 10.04.1 as The Shop is moving away from CentOS and to Ubuntu. As it turns out, the code I've been working on has pointed out a bug in the kernel that ships with CentOS 5 related to a lot of very fast malloc and free cycles. But that's not the point. On Ubuntu, I wanted to get the bitmapped font Misc Fixed Semicondensed, and by default, it's not allowed.

But that's only the default. Here's how to enable it.

The key is to remove the font config rule that says "no bitmapped fonts". To do this, we simply:

  $ sudo rm /etc/fonts/conf.d/70-no-bitmaps.conf

this was a link to /etc/fonts/conf.avail/70-no-bitmaps.conf anyway, so it's not like anything was really "lost". Then we need to re-run the font config to pick up this change:

  $ sudo dpkg-reconfigure fontconfig

Now you should have bitmapped fonts for Vim. Excellent.