Archive for the ‘Coding’ Category

Pushing Forward with More Codecs

Friday, August 27th, 2010

Today was the final push for the last two codecs I needed to write. I was really hoping to get both of them done, but the amount of code required in the first was just too much to get both done in a day. I had to send an email saying that I had slipped on this guy, and that I'd get back to it on Monday.

Not really hard, just a lot of code to get the messages decoded. Bummer...

Google Chrome dev 7.0.503.0 is Out

Friday, August 27th, 2010

The second nice update on the browser front this morning is that Google Chrome dev is now at 7.0.503.0. This represents the jump of the major version number after the release of Chrome 6.x to "beta". While I'm not unhappy with 6.x, I think the major updates the Chrome Team puts into the browser will be in the 7.x branch now, so I'd like to stay on the 'dev' branch as long as it's even reasonably stable. It's good to see them pushing forward.

Handling Fast Market Data Efficiently – Hint: Go Lockless

Thursday, August 26th, 2010

Today I was doing some testing on my latest data codec in my new ticker plant, and I ran across some performance issues that I didn't really like. Specifically, the processing of the data from the UDP feed was not nearly fast enough for me. As time went on, we were queueing up more and more data. Not good. So let's see what we had in the mix that we needed to change...

First, the buffer I was using was assuming that the messages from the exchange were not completely within a UDP datagram. This was a nice "luxury", but it's not true, and it was costing us time in the processing. It's better to assume that each UDP datagram is complete, and queue them up as complete units to process, than to have the logic in the buffer to "squish" them together into one byte stream, and then tokenize them by the ending data tags.

That was really quite helpful because at the same time I decided that it was a bad idea to use the mutex/conditional I had set up to allow the one producing thread and one consuming thread to efficiently access the data. Instead, I grabbed a very simple lockless circular FIFO queue off the web and cleaned it up to use for this UDP datagram buffering. It's easy enough to use - there's one thread that moved the head, and another that moves the tail. Simple. As long as the head and tail aren't cached on the CPUs, it'll work without locking. Simple enough.

But when I get rid of the locking/waiting, then I have to handle the case where the queue is empty and we need to try again. My solution there is to start simple and put a simple 250 msec wait. When I started testing this, I saw that there were significant pulses in the incoming data because a lot of datagrams arrived while we were waiting. So I got a little smarter.

I added an expanding delay - starting small, and building, so that we can hit it quickly if it's a short delay, but when the close comes, we'll only do a few checks before it goes to only a few times a second. That's very reasonable.

I did more tests and finally ended up with a variable scheme that had no delay for a few hits and then started stretching it out. Very nice.

In the end, I had something that emptied far faster than the UDP data source, and that's critical for a ticker plant. There's enough to slow it down later in the processing, so it's essential to start out as fast as possible.

Finally Finished Major Addition to Ticker Plant

Wednesday, August 25th, 2010

MarketData.jpg

Well, it's taken me a few days, but I've finally finished the code in my ticker plant to handle the options data feed. It's a biggie because instead of doing the same ASCII encoding that the other exchanges do, they switched some time ago to a FAST (FIX Adapted for STreaming) encoded stream to reduce the bandwidth needed to move the data from them to us. This just added a new wrinkle as we had to incorporate their FAST decoder implementation (initially), just to get the data into a binary format that we could do something with.

Then we had to adapt the code to allow for the fact that some messages from the exchanges, specifically OPRA right now, generate multiple messages to flow downstream. This wasn't hard, but it was in all the codecs, so it took a little time to get it all right and working properly.

I got it all finished, compiled correctly, and looking like it's ready to test. Time to commit it all to git and then get to the business of testing.

Fun with Exchange Codecs – FIX Adapted for Streaming

Tuesday, August 24th, 2010

MarketData.jpg

Well, it turns out that the ASCII-based exchange protocols NASDAQ, and some of the other lower-volume exchange feeds use is fine as far as that goes, but OPRA decided that it had pushed the limits of the ASCII protocol, and decided to make/adopt this FIX Adapted for Streaming - or FAST, protocol. In a sense, I can see why they'd adopt it - as opposed to writing their own, but I've read enough on the net to know that they really didn't adopt it 100% - just the compression of data part.

Basically, the FAST protocol is based on a few ideas:

  • Very Little to no ASCII to decode - no longer will there be numbers represented as ASCII digits. Most numbers are now simply integers. In fact, they only allow for three data types: 32-bit integer, unsigned 32-bit integer, and a string. WIth those, and a few decoder tables, you can handle anything an exchange needs.
  • Delta Encoding - there will be fields that are required in each message, but for some fields, the value present will be a simple increment, and in fact, it's possible to have nothing in the message, and have the assumption be that the value is simply incremented. This helps a lot. There are also values that are simple changes from the last value in the field, so duplicates can be removed. It's small, efficient, and makes for a compact encoded data stream.

The problem is, of course, that there is now state in the decoder. In general, this isn't bad, but what it requires me to do is to completely decode all the messages that I get, and the shortcuts I had that would extract just the sequence number, or just the flags for skipping the message - those are tossed out the window. I need to get all the data, and then deal with it.

This took a little while to work into my application, but in the end, I had the concept of a decoded message, and that message included the elements I had originally extracted, as well as the actual message. Thankfully, this is still pretty fast as OPRA isn't messing around with a lame decoder as it knows the point of this is to get more through the system.

I still need to do a lot of tests, and even finish writing my codec for the OPRA data, but at least I've got all the essentials of the FAST decoding working, and should be able to get moving forward again tomorrow with the messages.

Indiana Jones and The Legend of the Lost Codebase

Wednesday, August 18th, 2010

Detective.jpg

Well... I'm donning the old fedora again, and off in search of the Lost Codebase. It's really quite amazing the skill that some people have to hide code. I'm sure they don't think of it that way - they probably consider it to be exactly where they want it to be - the right spot. But if I can't find it after working in the repository for nearly two months, then it's time to call it "hidden". Yup... hidden. And that means I need to get out the fedora and get exploring.

The first thing I check is of course, the most obvious - the name of the directory. Clearly, this is a trap, for who in their right mind would put the code in a clearly labeled directory. No, that's the location for some of the code. Maybe. Hard to tell as the class files are nearly completely empty, and one would wonder if the code even compiles. I'm not fool enough to fall for that trick - typing 'make' could end of wiping out my entire machine's drive. I'm no fool.

Next, I check the similarly named directories. No luck there, but not nearly as complex, and some of the traps aren't even well constructed. In one there's no Makefile - a dead giveaway, if ever there was one. In another, they foolishly only include a handful of files. This is too easily scanned and I can see what I'm looking for isn't there. In all, a minor detour, but I have no idea where to go next.

Next I have to go with the big guns - I grep for a keyword in the entire source tree. As expected, this yields far too many hits, and I need to filter it down. Doggedly, I wrestle the filter on the grep to give me something I can work with. I struggle weeding out the false hits. I finally think I may be onto something only to have my hopes dashed when it's a simple comment and not the real code I'm looking for.

It's frustrating, and in the end, I realize I've met my match. I have to back off, regroup, and hope that when the author(s) decide to come in for the day, they have some answers to where the hid the secret directory to the code.

Oh yeah... I even checked for the hidden directories... no luck.

Google Chrome dev 6.0.495.0 is Out

Wednesday, August 18th, 2010

It looks like they have fired up the 'dev' channel again as Google Chrome dev 6.0.495.0 was released this morning. I went back to the dev channel after moving to the beta when it was released a few days ago. I have to say, this has become an incredibly stable platform. It's fast, looks like a Mac app, and it just plain works. Nice.

The release notes indicate that, for the Mac at least, we're getting a fix for the download shelf, and a few fixes for CSS and plug-in handling. Looks good to me.

Swatting Flies is an Annoying Thing to Do

Tuesday, August 17th, 2010

cubeLifeView.gif

I've been working (still) on getting more exchange feed codecs into the system, and while it's not really hard work, it takes a little thought, and a lot of attention to detail. So when I get some kibitzing from those that would love to see me fail, but are too afraid to really stand up to this project, it's like swatting flies - not hard, they aren't going to do me any harm, but it's annoying nonetheless.

When it gets bad, I just get up, take a little walk, get a pop, and clear my head. That usually does it. Oh... and getting another feeder done in less than a day makes me feel good. It shows the "flies" that they really might want to take notice of the different way I've put this together. But that's really hoping for too much, I suppose.

Time to get some bug spray.

Google Chrome beta 6.0.472.36 is Out

Tuesday, August 17th, 2010

This morning I noticed that Google Chrome beta 6.0.472.36 was out - still no word on a new 'dev' release, so it appears that for now, they are simply sticking with the 6.0.x branch and not starting anything new for the time being. It seems reasonable that if they aren't making major changes, they can keep the 6.0.x branch moving along from dev to beta to stable. It's only if they have great new ideas that it makes sense to open up the dev branch again.

So it's out there - a few little UI fixes - nothing major.

Once Again – Amazing Progress with a Good Design

Monday, August 16th, 2010

MarketData.jpg

Today I spent all day working on getting two exchange feeders written and tested. This kind of speed is not because I can copy/paste very fast, it's because I've got a solid design that allows me to leverage the work I've already done and customize it very quickly and easily. Given that the last developers of these feeds took months to achieve what I've done in a day, there's a lot to be said about the power of the design. The previous one was particularly ill-suited to this task.

So it was a hard day, but I'm getting a lot closer to the point that I'm caught up with all the exchange feeds we have. At that point, I can look to data enrichment, and really start to add value to the data feeds.