Archive for October, 2010

The Problems of Premature Optimization

Monday, October 18th, 2010

bug.gif

This morning I figured out my problem from late last week - the serialization problem. The logging really was the key, and the trigger was that I was seeing the deserialization of an array of messages starting over. It's like something reset the decoder. Very odd.

Then I looked at the size.

65495.

I about threw myself out the window.

I had assumed that all the messages I was going to receive were single message containers. For this, it makes sense to think of the size as a uint16_t. But what happens when you get a query for 50,000 messages? The response is an array of messages and the size is a lot bigger than 64k. The "starting over" was the key. It was wrapping around the counter and trying to match up the wrapped data to the messages.

Horrible failure.

I went into the code and changed all these uint16_t to uint32_t and I'll be fine. The tests were perfect, and I could get on with the rest of the testing. But this points out the problems with Premature Optimization. I was thinking that there was no need to have a counter/cursor bigger than 64k when all messages were less than 100 bytes. And I was right.

But I didn't think about the problem of dealing with messages that I create and can be much larger than a single UDP datagram. Being in this business for decades doesn't make you immune to this problem. It's the thought process that you have to be careful about. I'm as guilty here as anyone.

OK, maybe i found the problem a little quicker, but it still took the better part of a day, and it was annoying to boot.

Lesson learned: Optimize when you have a performance problem. Not before.

Realized I had a Bug in the Message Serialization

Friday, October 15th, 2010

bug.gif

Today I was doing more testing, more polishing, on my ticker plant and I realized that there's a bug in my message serialization. It wasn't obvious, as it effected the application the long it ran, it seemed. I was looking into the serialization and deserialization of the variants which forms the backbone of the serialization and deserialization of the ivars in the messages, but that all seemed to be working.

Yet I seemed to be receiving several thousand messages, but only a fraction of them were valid. Something was very wrong.

I started do a lot of logging, because there was no way to easily catch this in a debugger - it got more pronounced as time went along. So I had to put in some logging, then let it run a while, hoping that the logging was going to point out the problem, and cross my fingers.

The one "silver lining" of this is that when these kinds of things happen, I end up going back into the code and placing a lot of DEBUG level logging that clearly illustrates what's moving through the system at that point. I'm not talking about dumping byte streams, but there are times for that in the process, I'm talking about the nice, human-readable logging. Like how many messages are being passed in, the byte counts, and even the break-down of how many messages of each kind.

I end up making these logging helper methods that do a lot of this for me so the code looks clean and it's easy to add this kind of behavior in several places - like the server and the client. It makes life a lot easier. And when I make it look nice, leaving it in at the DEBUG level is really no "cost" to the project, and can help debug things later.

The problem today was that just when I was getting to the point that I was close on this issue, I had to leave for the weekend. I know there's not a lot I can do after-hours for a ticker plant - there's no source of ticks anymore, but I wanted to get this solved.

I'll have to live with disappointment.

On the up-side, I think it's in the Summary messages as those are having the problem right at the end of the day. Maybe it's the packing/unpacking of values not matching. I'll have to verify that on Monday.

Slight Disappointment in Boost’s unordered_map

Thursday, October 14th, 2010

Boost C++ Libraries

Well... I won't lie to you... I'm disappointed in the boost unordered_map. I really am. In the STL std::map an iterator is not invalidated unless you're on the element in the std::map that's being deleted. That's nice. You can get an iterator on a map, and as long as your application isn't deleting elements, there's no way you're going to get into trouble. That's nice for a multi-threaded situation where there's one writer and one reader. The reader gets the iterator and the writer just shoves stuff into the map. No problems.

But that's not how the boost unordered_map works. It's not written up in any of the boost docs, but a careful search of the online message boards yields the fact that no, in fact, the iterator on the unordered_map is not valid across any write whatsoever. This is really bad news, but not totally surprising.

I've been battling a serious problem in my ticker plant, and the core dumps I was getting were in the unordered_map which made no sense to me. But now it makes perfect sense. My reader thread was expecting it's iterator to be valid for the scan, and any write was able to blow it up. It didn't happen all the time, but often enough to make it clear what was happening.

Bummer.

I read that the SGI STL hash_map was supposed to preserve iterators like the std::map, but when I tried it, I ran into the same problems as the boost unordered_map. In the end, I had to settle for simple (but fast) spinlocks on the maps as the majority of the time there will be no contention, and I don't want to slow it down that much.

Google Chrome’s Fancy Site-Persistent Zoom

Thursday, October 14th, 2010

GoogleChrome.jpg

OK, color me impressed with Google Chrome. I've been using it for quite a while (except for those few days when I got really sick of Google's duplicity with Motorola) and have been pretty happy with it, but have always wished there were a CSS file - like there is with Safari, that would allow me to "zoom out" all the pages so that they aren't as big as they might normally be. Unfortunately, while there seems to be hints on the net about that, the ones I tried all didn't work.

I was left wishing that I could do what I wanted, but realizing that until Chrome got more advanced, I was probably just stuck with what I had. There's supposed to be a fix in Chrome that allows a user style sheet, but every time I tried to get it to work I failed - miserably.

Then this morning I read about a site-specific zoom setting - and it's persistent! I just had to try it. Sure enough... once I went to a site - even if I had multiple tabs open for that site, a zoom change in one effected all, and survived restart! That's sweet.

What it means is that until I get the global user stylesheet in a way that I can actually get it to work, I at least have the ability to set up my environment and have it look nice after restarts. That's sweet. Very nice.

But guys... really... would it kill you to have the user stylesheet? It's already in WebKit. All you need to do is really use it.

Working Under the Gun

Wednesday, October 13th, 2010

GeneralDev.jpg

Today I've been working a little under the gun. OK... a lot under the gun. Another group is looking to use the ticker plant I'm building and they want to be using it "right now". Well... it's not exactly built right now, but I'm doing my best to provide them with something they can test with. What ensues is a lot of pressure that I usually don't like working in. But... I know these guys, and they'd love to have an excuse why not to use it, so I have to suck it up and get things done as fast as possible.

Today was spent resolving a few issues about the ticker plant. The first issue that was a hold-over from yesterday was the fact that the FAST (FIX Adapted to STreaming) decoder for strings was returning more than it was supposed to. I was seeing a trailing asterisk (*) in some of the strings in the position after the last character I was supposed to see. So if the string was a maximum of 5 characters, then I'd see an asterisk in the sixth position.

In order to fix it, I went into my wrapper class and did a little more defensive coding:

  std::string codec::decode_string( fast_tag_t aTag, uint32_t aSize )
  {
    // let's give the decoder a little safe room - it needs it at times
    char    buff[aSize + 8];
    decode(aTag, buff, aSize);
    // make sure they didn't run long (and they do often)
    char  *ptr = &buff[aSize];
    *(ptr--) = '\0';
    // trim off the excess spaces on the right-hand side
    while ((ptr >= buff) && (*ptr == ' ')) {
      *(ptr--) = '\0';
    }
    return std::string(buff);
  }

I needed to give them a little headroom, and then truncate it at the maximum length, and then do a simple right trim of the data. Not hard, but it's amazing that their own decoder has these problems. Yikes.

The next problem with the ticker plant was the CPU usage. When it's just the ticker plant - without the cached ticks, it idles around 10%-20%. With the cache it was up around 70%. I started playing with the cache (it's lockless but uses boot's unordered_map) and saw that we were in another pickle. The general operation of this guy is to replace the old message with the new, and delete the old. But if we had people looking at the old, we'd mess them up something fierce.

I've written this up on another post, but the idea is to allow them to remain valid for as long as the client needs, and then trash them. It's not hard, and I don't think it slows things down much, but it's absolutely vital for proper operation.

Unfortunately, with all this work, I got to 3:15 before I could get a lot more done. At that point, I have no ticks, and I'm stuck. Kind of a drag to have to depend on other systems like this. But so it goes. Tomorrow I'll hit the CPU usage harder and see what I can find out for certain.

Preserving Iterators on Fast, Lockless, Caches – Use the Trash

Wednesday, October 13th, 2010

GeneralDev.jpg

This was actually the most fun part of my day. I was worried about how to allow the quick cache's clients to run iterators over the cache data and not get nailed when the data in the cache is updated by the exchange feed. In general, it's hard to imagine. You have a cache that by design doesn't lock or notify anyone, and a reader that also by design is going to be slower than the ticker feed and will, very likely, always be looking at the entire cache and getting into a lot of trouble with it's iterators.

I was trying to think of a clever solution to the problem when I had another really nice eureka! moment: What I'd have is a temporary "trash can", and when the client asks to keep things "stable", the cache tosses the old values into the "trash". When the client is done, it'll "throw the trash away", and the cost of the deletes will be on the client's thread.

If I built it with a simple STL __gnu_cxx::slist, then I'd have a very fast queue. I don't need to have any specific order, just a place to hold these guys until they are no longer needed. So rather than calling delete on the 'old' message, the put() method on the cache will instead do a push_front() to the 'trash' queue. It's fast, clean, and should work wonderfully.

I have two methods that control an atomic boolean. It's a little utility class I wrote that wraps an atomic uint8_t as a simple boolean so it can be toggled/set/read atomically:

  void QuickCache::saveToTrash()
  {
    if (!(bool)mUsingTrash) {
      // first, clean out anything that might be lingering in the trash
      Message  *m = NULL;
      while (!mTrash.empty()) {
        if ((m = mTrash.front()) != NULL) {
          delete m;
        }
        mTrash.pop_front();
      }
      // now set the flag that indicates that the trash is ready to use
      mUsingTrash = true;
    }
  }
 
 
  void QuickCache::takeOutTrash()
  {
    if ((bool)mUsingTrash) {
      // first, set the flag that indicates that the trash is NOT in use
      mUsingTrash = false;
      // next, clean out anything that might be lingering in the trash now
      Message  *m = NULL;
      while (!mTrash.empty()) {
        if ((m = mTrash.front()) != NULL) {
          delete m;
        }
        mTrash.pop_front();
      }
    }
  }

And then in my put() method, when I'd normally have deleted the 'old' message from the cache, I simply:

  // the new one is in the cache - let's dispose of the old one
  if (oldMsg != NULL) {
    if ((bool)mUsingTrash) {
      mTrash.push_front(oldMsg);
    } else {
      delete oldMsg;
    }
    oldMsg = NULL;
  }

I haven't had a real chance to test it, but I'm hoping that tomorrow morning when I get ticks from the exchanges, I'll be able to see that this is going to work. If not, then it's back to the drawing board and figure something else out.

Google Chrome dev 8.0.552.0 is Out

Wednesday, October 13th, 2010

This morning I noticed that Google Chrome dev had a significant version jump to 8.0.552.0, and while the release notes said there are stability fixes, I think the bigger reason is that the beta channel is on 7.0.517.0, or something like that, and they wanted to keep a full major release number ahead of that. I'm not buying that there are major, significant changes that just didn't make it into the release notes.

But that's OK, I just think it would have been a little more honest to be up-front about the renumbering.

Timing is Everything – Especially in Multi-Threaded Apps

Tuesday, October 12th, 2010

Today was spent getting the timing of events finished up. Yesterday, I took a lot of time getting the shutdown working, and then I had to make the movement of messages work. It's pretty funny to look at the code in retrospect - I had just assumed that things would be there when I needed them, and in my development case with hard-coded values, it was. But making it work with the actual timing of events was a lot harder than I had thought.

Basically another day spent on getting the start-up and shutdown right for all the components of the system. Ick.

Shutting Down Multi-Threaded Apps Takes Special Care

Monday, October 11th, 2010

GeneralDev.jpg

Today I spent a lot of time working on getting rid of core dumps when I shut down the ticker plant. This seems very obvious, but after I just did all the work to fold in the configuration and then even more work today to get the authentication token into all the configuration code, it required that I change a lot of little things around. The upshot of that was that the subtle interactions on the shutdown were quite broken and I had to fix them all up again - but not revert to the old system due to the configuration and authentication changes.

Little things I ran into made this a lot harder than it sounds. We have boost's asio to contend with, and the fact that we need to cancel any pending I/O with them on the shutdown. However, that's going to generate an error in the socket communications, and from that we'll try to shutdown the socket. This is a nasty loop that makes it very difficult to unravel.

What I had to do was to be very careful about the two different conditions, and if I'm being interrupted, then I need to assume I'm being told to shutdown and not assume there's something wrong I need to attempt to recover from. Thankfully, the error messages are clear enough to make this possible, but it's still a lot of detail work about when things are happening and who's responsible for doing what.

All this took me the better part of the day - with the initial part being trying to get the timing of the initialization of everything right as well. It's like I spent two days getting nowhere as I didn't advance the codebase one feature - I just made it possible to use the configuration and authentication tools we have to use. That's important, do be sure, but it didn't feel like I got a lot done.

Working Configuration into All Components in Project

Friday, October 8th, 2010

GeneralDev.jpg

I've been working hard all day trying to work the configuration system into all the classes that it really should be used. It was really my fault for not putting the configuration service into place before writing all the code that should use it, but it just wasn't there, and I didn't ask enough questions about how it was going to work to get all the parameters, etc. in-place. Just means the job of getting the configuration into the system is a lot harder than it could have been.

Thankfully, and this is a really weak silver lining, I had a good start at what I wanted to use, but I hadn't taken it nearly far enough to make the process simple. It's further complicated by the fact that I need the configuration data at the lowest level components, and that means passing it down, down, down... Not horrible, but it takes a little bit of work to make sure that you put the configuration data in as few a places as possible with maximum coverage.

Not horrible, but not trivial. Just takes a lot of time.