Archive for August, 2010

Working on Magic Map Implementation in C++

Tuesday, August 31st, 2010

Today I spent a good bit of time working on extending the variant class I have in my ticker plant to handle byte arrays and then adding in the map key space encoding and decoding. The idea is that map keys really don't need to be just any string - they can be a limited subset of the ASCII space, and in making that a limited set, we can pack more characters into fewer bytes, and then unpack them once they are on the receiver's machine.

Say we limit the ASCII space to 64 characters - any 64, really - just so long as there's a mapping from the 255 ASCII values to the 64 acceptable values, and back again. At that point, we know that we'll be able to store the mapped key space in 6 bits - 26 is 64. So if we look at a series of three bytes we can pack four of these characters into that 24-bit space:

Byte 1 Byte 2 Byte 3
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Char 1 Char 2 Char 3 Char 4

The code for the conversion is pretty simple - look for now many even 'blocks' of four characters to encode, map them into the limited key space, mash their bits and pack them into three bytes. The remainder is a simple process of doing part of that.

The decode is simple as well - we just need to have some terminal condition - and that can be a simple 0x3f (all 1s) in the last (terminal) 'character'. So we look at the length of the byte stream, see how many "full sets" of three bytes there are - convert each into four characters, and then based on what's left, we have only a few options. It's pretty simple to decode, and you have the original string back.

I needed to get this all in the variant class as the next thing I needed was to add in the update stream that will be sent by the data service in response to updates at the source. This is similar to the kind of updating I did in Java back in InfoShop: there was a HashMap that understood a 'path' concept, and transactions. You could put a map into a transaction, do things to it, and then commit the transaction and get a transaction log that could be sent to remote copies of the same map, and be applied to bring it up to date.

It worked very well for me at the time, and it's something very similar this time. There's no transaction, and the updates are all streams with an action, a path, and an optional value. It's simpler, but for the data sets we're dealing with, it's sufficient.

I got a lot of the decoding of the update stream done, but I think it's time to put together a client and see how it functions to see what I'm really getting from these servers. There's nothing like the socket-level byte stream to remove all the questions and ambiguity in the specification. So that's going to have to wait for tomorrow.

Case Sensitivity and New Developers

Tuesday, August 31st, 2010

Crazy Lemon the Coder

I believe, though have no factual data to support this belief, that most developers these days are learning to code on Windows with some IDE - like Eclipse, NetBeans, etc. It's the easiest way to learn to code, I'll grant you, and it's the cheapest platform to use. Heck, you can get a decent Windows laptop at BestBuy for under $800. Peanuts.

But there's a good contingent that seems to have fallen in love with the Free Software movement, and picked up linux to put on their $800 laptop, and are using Eclipse on linux. This is not bad. But it's the clash of these two worlds that often leads to problems.

Then again... it could just be that the developers aren't all that good, and need to be whipped into shape before they're set loose on the codebase.

So here's what happened today: I'm trying to clone this git repository onto a windows box because the way I have the monitors hooked up, it's easiest to see the code from the Windows screen. My first problem was with the git that ships with Cygwin. Turns out, if there's a problem with the repo (as there was - sort-of), the Cygwin git gets all messed up. Maybe in a subsequent release they'll get it fixed, but the consensus at The Shop is that the better solution is to get msysgit from Google Code.

(As an aside, I've also decided to upgrade my Cygwin to 1.7.7 as it seems I was pretty out of date, and that could possibly have contributed to the problems.)

So I clone this repository and I find that I've already got a changed file. Hmmm... that's odd... so I check that file out again... still changed. Very odd.

Turns out, the developer had created two files - one a Java source file, the other a shell script - one named MyTest.java, the other myTest.java. OK... that's got all kinds of wrong written all over it. Case is not the way to distinguish files - not in the multi-platform world. And who makes a shell script have an extension of .java?

When I pointed this out to him, he cleaned it up and there wasn't a problem any longer. But it reminded me that what I take for granted is not exactly universally understood.

Tough Day Full of Avoidable Problems

Monday, August 30th, 2010

Today was a day I'll be glad is over... there were just so many avoidable problems and delays that it makes for a day that really is best forgotten as soon as possible. It started out reasonably well - I was finishing up the work I'd start on Friday. I had really wanted to get it done on Friday, but it took another three hours (roughly), so it would have been silly to really stay and see it through. Also, there was no way I could have tested it Friday evening - and this morning I was able to use the live data to verify that things were working as planned.

But at this point, things were still going pretty well. The code was done, everything tested out, I checked it all in and pushed it to the central repo - pretty nice. But then the avoidable stuff started to bite me.

I needed two machines to test the throughput of ZeroMQ as a reliable multicast distribution system for my tick data. Nothing fancy, but I needed to have some way of replacing 29West as we weren't really using it as a solid middleware - just a multi-channel reliable multicast system. Given it's limited usage, I looked at ZeroMQ and thought Hey, if this works, I'm in business! But in order to know if it'll work, I need to actually get the ticker feeds working, put the messages into ZeroMQ, and pick them up on another box. Hence the need for two boxes.

Well... they got a few boxes with 10Gb ethernet NICs in them to make sure that I didn't have to worry about the NIC being the bottleneck, and they were ready for me to check the boxes out. As per the way things are at The Shop, the standard mechanism for getting to these servers is NXMachine or SSH. Given that I'd be testing and building, I decided to go with NXMachine. It installs pretty easily, and with this simple fix, it should work just fine.

Silly me...

I spent a full morning trying to get the NX Server to work. I knew the client was working, and I knew the server could work, but it wasn't allowing me to get a complete connection. Well... I got the connection, but when I went to actually display the X session on my box, it disconnected me. Very odd.

I tried logging the NX Server - no luck even when following directions. I tried re-installing the software - no good, either. I tried different parameters for the client - no good. In the end, I went to my boss and asked him for help. He couldn't get in either, but then realized that maybe it was because GNOME wasn't installed. Specifically, the GNOME Desktop environment.

That was the problem.

When he did a simple:

  $ yum groupinstall "GNOME Desktop Environment"

he got some 124 packages that needed to be installed. It seems that they didn't put the full GNOME install on the box as it was a "server". My previous box had the GNOME Desktop installed prior to me using it - which is why it worked. Had I stuck with a bunch of SSH sessions into the box, it would have been fine. It was just the desktop login that was the problem.

After that was solved, I was able to install boost, log4cpp, ZeroMQ, and a few other things to get this new box to the point that I was able to verify that all the code worked, and that everything compiled and ran.

Lots of grief for something as simple as not having the login desktop stuff installed.

Pushing Forward with More Codecs

Friday, August 27th, 2010

Today was the final push for the last two codecs I needed to write. I was really hoping to get both of them done, but the amount of code required in the first was just too much to get both done in a day. I had to send an email saying that I had slipped on this guy, and that I'd get back to it on Monday.

Not really hard, just a lot of code to get the messages decoded. Bummer...

Google Chrome dev 7.0.503.0 is Out

Friday, August 27th, 2010

The second nice update on the browser front this morning is that Google Chrome dev is now at 7.0.503.0. This represents the jump of the major version number after the release of Chrome 6.x to "beta". While I'm not unhappy with 6.x, I think the major updates the Chrome Team puts into the browser will be in the 7.x branch now, so I'd like to stay on the 'dev' branch as long as it's even reasonably stable. It's good to see them pushing forward.

Camino 2.0.4 is Out

Friday, August 27th, 2010

This morning was a banner morning for open source browsers. Starting with Camino 2.0.4. Today's update has quite a few nice things about it - an updated Java plugin, fixed Flash issues, latest Gecko engine... it's a good update. Now, I'm still not sure I'd use it day-to-day, but that's because Google Chrome and Safari are the best two I've seen today, but it's nice to see Camino stay up to date and push the rest to not get complacent.

iWork 9.0.4 is Out on Software Updates

Friday, August 27th, 2010

This morning I saw that Apple had an update to iWork '09 (9.0.4) on Software Updates, so naturally I had to get it. While I'm not sure I'll be doing a lot of ePub work, it's nice that they have closed the loop on the iBooks and content creation. Now it's simple to write something, put it in ePub format, and get it into iBooks. Very slick. But then again, that's what we all expect from Apple. Nicely done.

Handling Fast Market Data Efficiently – Hint: Go Lockless

Thursday, August 26th, 2010

Today I was doing some testing on my latest data codec in my new ticker plant, and I ran across some performance issues that I didn't really like. Specifically, the processing of the data from the UDP feed was not nearly fast enough for me. As time went on, we were queueing up more and more data. Not good. So let's see what we had in the mix that we needed to change...

First, the buffer I was using was assuming that the messages from the exchange were not completely within a UDP datagram. This was a nice "luxury", but it's not true, and it was costing us time in the processing. It's better to assume that each UDP datagram is complete, and queue them up as complete units to process, than to have the logic in the buffer to "squish" them together into one byte stream, and then tokenize them by the ending data tags.

That was really quite helpful because at the same time I decided that it was a bad idea to use the mutex/conditional I had set up to allow the one producing thread and one consuming thread to efficiently access the data. Instead, I grabbed a very simple lockless circular FIFO queue off the web and cleaned it up to use for this UDP datagram buffering. It's easy enough to use - there's one thread that moved the head, and another that moves the tail. Simple. As long as the head and tail aren't cached on the CPUs, it'll work without locking. Simple enough.

But when I get rid of the locking/waiting, then I have to handle the case where the queue is empty and we need to try again. My solution there is to start simple and put a simple 250 msec wait. When I started testing this, I saw that there were significant pulses in the incoming data because a lot of datagrams arrived while we were waiting. So I got a little smarter.

I added an expanding delay - starting small, and building, so that we can hit it quickly if it's a short delay, but when the close comes, we'll only do a few checks before it goes to only a few times a second. That's very reasonable.

I did more tests and finally ended up with a variable scheme that had no delay for a few hits and then started stretching it out. Very nice.

In the end, I had something that emptied far faster than the UDP data source, and that's critical for a ticker plant. There's enough to slow it down later in the processing, so it's essential to start out as fast as possible.

Finally Finished Major Addition to Ticker Plant

Wednesday, August 25th, 2010

MarketData.jpg

Well, it's taken me a few days, but I've finally finished the code in my ticker plant to handle the options data feed. It's a biggie because instead of doing the same ASCII encoding that the other exchanges do, they switched some time ago to a FAST (FIX Adapted for STreaming) encoded stream to reduce the bandwidth needed to move the data from them to us. This just added a new wrinkle as we had to incorporate their FAST decoder implementation (initially), just to get the data into a binary format that we could do something with.

Then we had to adapt the code to allow for the fact that some messages from the exchanges, specifically OPRA right now, generate multiple messages to flow downstream. This wasn't hard, but it was in all the codecs, so it took a little time to get it all right and working properly.

I got it all finished, compiled correctly, and looking like it's ready to test. Time to commit it all to git and then get to the business of testing.

Apple Security Update 2010-005 is on Software Updates

Wednesday, August 25th, 2010

This morning there was another round of security updates from Apple - covering PHP, PDF, etc. It's not a lot, but hey, when there's an exploit in the wild, you have to fix them, and I'm going to make sure I get the very latest updates. Just makes good sense.