Archive for the ‘Coding’ Category

Making a New Transport for my Ticker Plants (cont.)

Thursday, February 24th, 2011

GeneralDev.jpg

Today I've been at it all day - writing more code for my new UDP-based transport. There's a lot that I need to get going, and it's not at all trivial to get these guys working properly when you aren't locking anything. It's been a nasty day of a lot of tests and tweaks, but in the end, I have something that delivers bursts of 320,000 msgs and it does it all pretty efficiently.

Tomorrow, I'll start the live testing.

Making a New Transport for my Ticker Plants

Wednesday, February 23rd, 2011

GeneralDev.jpg

This afternoon I started work on a new UDP-based delivery system to replace ZeroMQ in my ticker plants. It's going to be as simple as I can make it while still giving me the level of service that I need for my users. Basically, I'm sitting in nice data centers, and the switches are nice, so I shouldn't have a lot of drop. But there will possibly be some, so I need to be able to plan for that. I'm not exactly sure how to implement the reliability in this system, I've got a few ideas that all could work, but I'll for sure need something.

What I want to start off with is a simple UDP broadcaster and antenna. These will take the messages, serialize them, put them in the UPD datagrams, and send them out on the different multicast channels I'd been using for ZeroMQ. I'll use boost ASIO for all this, so it shouldn't have too much overhead - it works for the incoming data from the exchanges.

On the antenna, I'll have one boost udp socket for each multicast channel, and then have a single io_service thread reading them off the socket and into "byte buffers" - one per socket. Then we'll have a thread pulling the datagrams off the queues and deserializing them to place them into the conflation queue. Pretty simple model.

Lots to write. Better get at it.

Real Problems with ZeroMQ

Tuesday, February 22nd, 2011

ZeroMQ

It really breaks my heart to see problems with ZeroMQ that aren't being addressed. For instance, I've been trying to work with the guys on IRC about the fact that the GitHub master doesn't receive messages when using OpenPGM. I've talked to the guys on IRC about this, and it seems that some of the more recent changes really messed this up, and there were no checks on the code to make sure it was still working. Sad, but true.

I've tried several times to jump-start this several times, but I didn't get very far on any occasion. This makes it hard to use, as I can't use it without OpenPGM, but I'd like to get some of the newer features they are talking about.

But that's not the worst.

I think ZeroMQ is either delaying, re-ordering, or retrying with excessive delay, some of my messages - and it's only when I'm really hammering the data rate. For example, I know that the ZeroMQ send() method is asynchronous. So it buffers up the data and then sends it. But what if it gets messed up?

The delivery seems to get worse and worse as the day goes on, and it seems to be based on a zmq::socket_r getting into a "bad state" and never getting itself out. I believe that it's in the receiver, because two apps on the same box have different reception profiles after a time.

In any case, I can't trust it and I can't build the latest code. It's just no longer a really workable solution. I have to find some kind of solution that doesn't include ZeroMQ.

When a Stack Variable Gets Too Big – Go to the Heap!

Tuesday, February 22nd, 2011

bug.gif

This morning I noticed that when I had subscribed to a large portion of the option market data feed, I got into a position where the conflation queue could exceed 131,072 (128k) messages. It's not impossible for a slow client to have this problem, and the safest thing to do is to increase the size of the FIFO queue in the conflation queue to cover all possible cases. So I bumped it up by a factor of 4 to 512k (219), and then all of a sudden I started getting seg faults. What?!

I started digging into this and was just plain stunned to see that it really didn't matter what the use case was for the conflation queue - if I had a conflation queue as a stack variable, I was going to seg fault. Right at the beginning of the app.

When I changed it to allocate one on the heap, everything was fine. OK... looks like a limitation or limit on the shell - but nothing I could find pointed out what to change. Also, this was something that everyone would have to know, and that's not a very user-friendly experience.

So I decided to make all the uses of the conflation queue in my code to be heap-based. It wasn't too hard - change the variable to a pointer, all the 'dotted' references to 'arrows' and then you're almost there. The last big thing was to make sure that they were initialized an cleaned up properly.

Thankfully, I didn't have all that many places that I had used the conflation queue - three classes in total. It was pretty easy to look at each and come up with a plan for creating these guys and cleaning them up properly. The whole thing probably took me 45 mins - and of that, more than half was spent trying to come up with a way around the seeming limitation.

Important thing to note when creating large items - you make have to go to the heap just because.

Amazing Difference A Character Can Make

Friday, February 18th, 2011

bug.gif

Today I had an amazing debugging session with some code I'd written and thought was in pretty good shape. It took most of the day, but there were really several things I needed to figure out in the test case I was given. Basically, it was an extreme edge case, and yet, it was possible, so it needed to be handled properly.

But the final bug was a doozie. The original code looked like this:

  const uint128_t  & ckey = aMessage->getConflationKey();

and in that simple line there was the final bug.

If we had multiple threads hitting the same code, it was possible to have these messages cleaned out. But what happens when you delete the aMessage and you have a reference to a value in it? Well... it goes to poo, that's what. The fix was simple:

  uint128_t  ckey = aMessage->getConflationKey();

One character - the ampersand. That's it. Make a copy as opposed to holding onto a reference. One character. Amazing.

Some days I love this job.

Google Chrome dev 11.0.672.2 is Out

Friday, February 18th, 2011

V8 Javascript Engine

This morning I noticed that Google Chrome dev 11.0.672.2 was out and with it quite a few changes. While the major version number doesn't mean a great deal, in my experience, the additions of the new V8 engine as well as a few nice Mac specific features makes this nice to have. I'm sure there's more to it than just this, but this is enough.

The Lockless MP/SC Ring FIFO Revisited

Thursday, February 17th, 2011

bug.gif

Today has been an interesting day... it started off with one of my servers having a lot of trouble with my new RingFIFO. Because it was still very early in the day (before the open), I just swapped out the RingFIFO with my LinkedFIFO and everything ran fine today. Still... when I had some time later today, I wanted to dig back into the issues with my RingFIFO and see if I couldn't fix it up and put it back into play for tomorrow.

Turns out, all I needed was a break and a little out of the box thinking. What I had originally used as the 'ring' was a simple template array of values:

  /**
   * We have a very simple structure - an array of values of a fixed
   * size and a simple head and tail.
   */
  volatile T         mData[eCapacity];
  volatile size_t    mHead;
  volatile size_t    mTail;
  volatile size_t    mInsPt;

but the problem with this is that I had the concern about how to atomically move the tail and place a value in the spot while checking for queue crashing. It was just too hard. In my old code, the push() referenced the head and the tail, and the same for the pop(). This was clearly a recipe for disaster.

So I simplified. Significantly. The results are pretty nice.

First thing I realized is that the location is really orthogonal to the validity of the data in the queue. This means that I should really have something that 'tags' the data elements as 'valid' or 'invalid', and then use that to know when I can pop, or if I've overrun the queue.

If the head is pointing to the next good value, then all I need to do is check it's 'valid' flag. If it's valid, then I can pop it. Move the head up, snatck the old value and invalidate it, and we're done.

It's all easily done if we change our data elements to something like this:

  /**
   * In order to simplify the ring buffer access, I'm going to actually
   * have the ring a series of 'nodes', and for each, there will be a
   * value and a valid 'flag'. If the flag isn't true, then the value
   * in the node isn't valid. We'll use this to decouple the push()
   * and pop() so that each only needs to know it's own location and
   * then interrogate the node for state.
   */
  struct Node {
    T      value;
    bool   valid;
 
    Node() : value(), valid(false) { }
    Node( const T & aValue ) : value(aValue), valid(true) { }
    ~Node() { }
  };
 
  /**
   * We have a very simple structure - an array of values of a fixed
   * size and a simple head and tail.
   */
  volatile Node      mData[eCapacity];
  volatile size_t    mHead;
  volatile size_t    mTail;

Now I can have a far more cleanly separated push() and pop(). This is going to make a significant difference, I hope. I can say this for sure - it's considerably faster in my tests - like about twice. Very nice.

Wicked Error with (Thankfully) Simple Solution

Thursday, February 17th, 2011

bug.gif

This morning I got an error that looked like this:

  *** glibc detected *** tpClient: double free or
     corruption (fasttop): 0x0...b0 ***

and after a bit of googling, it appears that this problem is in the C++ runtime checking on the calls to free(). It seems that the default behavior is to check the argument to free() to see if it's a double-free. Sounds nice, in theory, but if we're hammering the system, it seems like this might not be a good thing to do.

Thankfully, the solution is easy - don't check. To turn it off, simply say:

  export MALLOC_CHECK_=0

and the checks will not be done. Good enough for me. I'm still going to try out the TPMalloc from google, but that's a previous post. Still don't have it built for x86_64 yet.

Interesting Thread-Cacheing malloc() Replacement from Google

Thursday, February 17th, 2011

google-labs-logo.gif

This morning I was looking into the problem with my ticker plant clients and the older kernel shipping with CentOS 5. Basically, when I increased the message rate, we crossed over some threshold on CentOS and we started getting a lot of heap corruptions manifesting themselves as malloc() or free() problems in ZeroMQ. On Ubuntu 10.04.1 everything was fine, most likely because the kernel in Ubuntu 10.04.1 was significantly newer than the one in CentOS 5. So I went on a search for a better malloc() and free().

What I came across was the google-perftools. This is pretty amazing stuff. It's a thread-cache replacement for malloc() and free() that is as simple as adding a -ltcmalloc to the build line. It's got profiling tools as well, but that's not what interests me as much, it's the amazing speed gains that it provides. The graphs on the paper how about a 4x increase in operations per second when using this.

It's not conceptually hard - the TCMalloc library grabs a large block of memory from the system and then offers it up to the application. This puts the calls in user space, and the control of memory there as well. Because their design has the smaller blocks held in the thread, it's possible to see no locking contention on the malloc() and free() which should be a major boon to me.

I have to get it built by the Unix Admins for Ubuntu 10.04.1 - I've already built the x86_64 RPMs for CentOS 5 and installed them on a test box I have access to, but I really want to start on the Ubuntu boxes. Simple change, should see major improvement. Very exciting.

UPDATE: it's built on all my boxes and ready to go for tomorrow. I'm excited about the possibilities.

Google Chrome dev 10.0.648.82 is Out

Thursday, February 17th, 2011

This morning I saw that Google Chrome dev 10.0.648.82 was out with a stability release update. It's worded a little strangely, but then again, these guys re engineers not english majors, so you have to cut them a little slack now and again.

Nice to see the improvements. It's getting better and better.