Archive for the ‘Coding’ Category

Creating a National Best Bid/Offer Engine

Tuesday, November 2nd, 2010

Today I started, then stopped. Then restarted work on a National Best Bid/Offer engine for my ticker plant system. Basically, the rules are pretty simple: you keep track of the ticks for an instrument by exchange - keeping the latest bid/ask data for each exchange. What you might end up with after a few ticks is something like this:

Exchange Bid Price Bid Size Ask Price Ask Size
A 12.50 25 14.00 20
C 12.25 20 14.00 25
D 12.50 20 13.75 20

The rules say that you look at all the bid prices - pick the largest one, and then sum all the bid sizes for those exchanges where the price is the "best bid". In this case, the best bid is 12.50 and the sizes are (25 + 20) = 45. For the best ask (offer), you look at the minimum asking price, and sum up the sizes that match that value. In this case, the price is 13.75 and the size is a simple 20.

So, there are a lot of ways to accomplish this. One is just to have a cache of all the ticks - organized by instrument, by exchange, and then when a new one comes in you toss out the old, recompute the new NBBO, and then if something has changed from the last time you calculated the NBBO, you send out a new message to the downstream clients. Pretty simple.

The problem with this is that while it's easy, it's a guaranteed two-pass system because you need to first find the best price (of either kind), and then sum all those sizes with that price. Not ideal. Also, there's the possibility that a Quote message from one of the exchange services will have a bid from one exchange and an offer from another. So it's not rally easy to keep the entire message - and I don't really need it. I only really need the data.

So what if I just kept the data?

  struct ExchangeData {
    // bids and asks are separate and organized by exchange
    float      bidPrice[26];
    uint16_t   bidSize[26];
    float      askPrice[26];
    uint16_t   askSize[26];
    // ...and this is the NBBO that's generated from it
    float      nbboBidPrice;
    uint16_t   nbboBidSize;
    float      nbboAskPrice;
    uint16_t   nbboAskSize;
    // ...and a lock to make sure it's all changed atomically
    boost::detail::spinlock   dataMutex;
  };
  typedef struct ExchangeData exch_data_t;

If I have this structure - one for each instrument, in some simple data structure (like, say, a trie), then I can forego the message retention, and just retain the data from the message and update the data in the structure.

But wait, there's more.

If I'm careful about what I do, I can update the NBBO values without having to make a two-pass update. What I can realize is that there already exists, for each tick, an NBBO, and if I just update that by subtracting out the old data, and adding in the new, then the values will always stay up to date. No need to do the two-pass calculation.

The code for just the bid looks something like this:

  if ((bidExch != '\0') && (bidPrice > 0.0) && (bidSize > 0)) {
    // get the exchange as an index into the arrays
    uint8_t    idx = bidExch - 'A';
    // see if we need to remove the old bid from the NBBO
    if ((myExchData->bidPrice[idx] == myExchData->nbboBidPrice) {
      myExchData->nbboBidSize -= myExchData->bidSize[idx];
      newNBBO = (myExchData->bidSize[idx] > 0);
    }
    // see if we have a *new* NBBO bid, or maybe just an update
    if (bidPrice > myExchaData->nbboBidPrice) {
      myExchData->nbboBidPrice = bidPrice;
      myExchData->nbboBidSize = bidSize;
      newNBBO = true;
    } else if (bidPrice == myExchaData->nbboBidPrice) {
      myExchData->nbboBidSize += bidSize;
      newNBBO = true;
    }
  }

and with this code, I should be able to maintain a decent NBBO engine with very little latency through the system. I still need to write a few things and test it all out, but it's a really promising start, and I'm excited about the possibilities.

Sweet idea.

UPDATE: This logic isn't going to work. What if I fall out of the NBBO? This code will remove the contribution, but if I'm the only one, then we're left with a bad price and a zero size. I need to fix this. But I still think it's salvageable. The only case when we will have to scan is when there's only one exchange in the BBO, and that one is out. Then, we have to find the next best, and that's likely a scan.

Gearing Up for Full-Up Tests

Monday, November 1st, 2010

Today I was working on the Enrichment Server part of the Ticker Plant system and was making pretty good headway when the Chief Architect came up to me and asked me what it would take to get a complete, full-up, system going. "Well... I'm pretty close... I'd say a few days." I thought. "Good! Let's try for the end of the week." Sounds OK to me.

There were a few things I needed to get back into the source code for the tests. We'd done a significant change to the messages - ripping out the old 64-bit security ID in favor of the std::string security key, and the group that was going to use this needed to have the security ID for trading reasons. So I had to add that back in, but only in the Client code as it makes no sense to slow down the upstream systems for the few that need this feature.

So I started digging into my git repo to find the check-in where I took it out. I have to say, for a "universal" client, gitk is not bad at all. It's pretty decent for being platform neutral. I was easily able to find the check-in and then look at the diffs and see what I needed to add back in.

It took a little bit, but it was not really hard. Just a little time.

Unfortunately, the Broker and it's infrastructure is not up, so I can't test. But when I can, it'll be nice to make sure all this works. Then I just need to find memory for my box as it's got far too little to be effective for a complete SIAC feed... but that's a task for the management types who control the money.

Google Chrome dev 8.0.552.23 is Out

Sunday, October 31st, 2010

I noticed today that Google Chrome dev 8.0.552.23 is out with some fixes for the plugins as well as disabling the accelerated 2D canvas. I'm not sure I'm thrilled with the disabling of the 2D acceleration, but I'm guessing that they didn't like the consequences of having it on, but I'm guessing they'll turn it back on soon enough.

Finally Gettin’ Some Love from the TickerPlant

Friday, October 29th, 2010

It's been a tough morning but I'm finally starting to get a little love from my latest cut of the ticker plant. It's been a while trying to refactor the code to get rid of the 64-bit security ID in favor of the std::string security key. It's good, in that it's going to be a lot faster in the end, but getting there has been more than a little challenging.

But finally, things are starting to pay off. Good enough.

I still need to see what I can do to make the generation of the security key from the exchange message faster, but that's going to be targeted code in just a few places in the system. Meaning, it's much easier to do. For now, I'm just happy things are running smoothly and solidly.

Whew!

Finding Errors that Aren’t Really There – Harder than it Sounds

Friday, October 29th, 2010

I spent several hours this morning trying to find bugs in my ticker plant that simply weren't there. What I mean to say is that I was changing the trie from three levels to four, and at the same time radically simplifying the node structure and tree walking and all of a sudden my unit tests were failing. What?!

OK, I made a bunch of changes, let's see what really changed, and start to figure out the problem.

The first step was to look at the failing tests and see what the problem was. The first indicator was that the size() methods were failing. What I'd done was to create a main node superclass so that I could have just two classes: leaf and branch. So I took:

  struct LeafNode {
    family_t                 *family;
    family_map_t             others;
    boost::detail::spinlock  othesMutex;
  };
  typedef struct LeafNode leaf_t;
 
  struct BranchNode {
    family_t         *family;
    leaf_t           kids[47];
  };
  typedef struct BranchNode branch_t;
 
  struct RootNode {
    family_t         *family;
    branch_t         kids[47];
  };
  typedef struct RootNode root_t;

where the branch and root are different because of the type of their children, to the more uniform:

  struct Node {
    family_t         *family;
  };
  typedef struct Node node_t;
 
  struct LeafNode : Node {
    family_map_t     others;
    boost::spinlock  othesMutex;
  };
  typedef struct LeafNode leaf_t;
 
  struct BranchNode : Node {
    node_t           kids[47];
  };
  typedef BranchNode branch_t;

so that I could have several branches from the root to the leaf. This was a good idea - but it wasn't exactly how I had it originally implemented. In the cut-n-paste refactoring, I ended up leaving in the family pointer in the branch. This had the added benefit of not generating any errors or warnings in GCC, but when I constructed the subclass, the super worked just fine, and then the subclass threw a junk non-NULL pointer in for family.

This took me far longer than I had hoped to find. I was tracking down the constructor calls, seeing the family set to NULL by the superclass, and then all of a sudden being set to some junk in the main class. Finally, I was looking at the header file and noticed that I hadn't removed the family reference in the branch class. Fixed that, and off we go.

Then things started working... but they seemed to be going very slowly. In fact, it seemed to be going horribly slowly. So I started to look at what I was doing and see what might be causing the hold up.

Was it the looping in the tree navigation? Was it the use of subscripts in the elements of the tree path? All these didn't seem right. But in case, I refactored the code again just to have a more direct path with direct variables. No difference.

Crud.

Then I realized that I was doing four things in each pass of the loop, and given that, the times were really better than I had in the past. I took out the "straight-through" code, and back to the looping, and even a little better.

So... two mistakes in one day for not paying attention. That's how a professional does it. Yeah... right.

Incorporating the Better Map into the Ticker Plant

Thursday, October 28th, 2010

I spent a good part of the rest of today incorporating the new instrument map I created into the existing classes that I had in the ticker plant. It wasn't terribly difficult, I did have to add a remove() method so that I could use the map in the ConflationQueue, but that was pretty easy. Once I got the code all ready to roll I realized that the supporting infrastructure (the Broker) wasn't up yet - it was undergoing a hardware upgrade.

Bummer.

By the time it was up, it was well after 3:15, and the market data was all gone. Shucks. I really wanted to test it today but I just didn't have the chance. So it goes... I'm still glad I'm using the Broker, as I think it's a great piece of code and a wonderful way to glue things together, but external dependencies of any kind can come back to bite you in the rear at the most inopportune times.

So it'll be tomorrow before I get some real answers, but I'm hoping that the inclusion of the much faster msg::imap is going to cut down on conflations and increase the messages per second to the client.

Building A Better Map (cont.)

Thursday, October 28th, 2010

Professor.jpg

This morning I finished my "better map" for the instrument messages for my ticker plant. I had a few typos because I did a bit of copy/paste to get the three levels coded up as quickly as possible, and there was a lot of coding due to the extensive use of pointers to save space.

In any case. I got it to compile and then ran through a few simple tests. Fixed a few typos, got the tests to work, and then decided to try a speed test. The code was going to make 1000 messages, store them in the map, and time the results. The nature of the map was such that the put() method returned the old Message pointer for that instrument so that you could dispose of it, or make note of it, etc.

So the test loop looked like this:

  msg::Message         *oldie = NULL;
  msg::message::Print  *print = NULL;
  uint64_t  startTime = msg::TransferStats::usecSinceEpoch();
  for (int i = 0; i < 1000; ++i) {
    print = new Print("S:IBM", startTime, (100 + i), 1.50, (10 + i), 'T', 0);
    oldie = m.put(print);
    if (oldie != NULL) {
      delete oldie;
      oldie = NULL;
    }
  }
  uint64_t  elapsedTime = msg::TransferStats::usecSinceEpoch() - startTime;
  log.info("updated the S:IBM Print 1000 times in %ld usec", elapsedTime);

The response was more than I could have hoped for:

  28-Oct-10 08:26:30.08 INFO imap : updated S:IBM Print 1000 times in 816 usec

Wow! That's better than once a microsecond! That's fast. Very fast.

Now I get to work it into the code and see how it impacts the overall performance. I know it'll be better than the conflation keys I've been using, and that's a great boost for the ticker plant.

Nice!

Using ZeroMQ with Clojure (Java) on Mac OS X

Thursday, October 28th, 2010

ZeroMQ

I was reading the ZeroMQ IRC channel this morning and they talked about the changes to their web site, and so I decided to check them out. They had a post about using ZeroMQ with Clojure, and I decided that it was worth checking out. What I found was a wonderful article about using it with Java on Mac OS X.

I know that we're not targeting Mac OS X - yet, but it's nice to know that there's help out there if and when we start to move there - or I move there myself. Very nice.

Building A Better Map

Wednesday, October 27th, 2010

Professor.jpg

When I got the 128-bit integer done, and the string-based security key put into my ticker plant I was not totally surprised to see that the performance was bad. I mean really bad. It's not something to hack a finger off for, but it was bad. So I knew I had to come up with something to get rid of the need for the 128-bit integers and keep the speed of the processing up.

Along those lines, there were several other things I could do, and still need to do to the ticker plant and how it parses the incoming data from the exchange. I need to handle parsing of the data a lot faster, as it's getting to the point that I've got it right, now I need to get it fast.

Still... the biggie is the map of messages for each instrument. It was the reason for the 128-bit integer, and it needs to be handled. So I'm trying a trie - but only to a point. I looked at the data, and if we encode A-Z, a-z, 0-9, and a little punctuation, we can do it all in 47 values. So if I map the ASCII symbol name with this scheme, I can have 47 bins for the first character, 47 for the second, and 47 for the third - 473 = 103,823 entries. At a minimum of a pointer, that's still under 1MB for everything in three characters. If I add a map at the 'leaf node' for those symbols in excess of three characters, it'll slow down a bit, but the locks will be almost non-conflicting, and things should be pretty speedy.

The core of the map will be a Family where we'll hold the message for the underlying along with a map of the call options and another for the put options. These maps will be keyed on the expiration date and strike, and will be protected by spinlocks. But since they are only for these maps, the likelihood that we'll encounter a lock contention is very small. It'll still be very fast.

  struct Family {
    Message                  *stock;
    option_map_t             calls;
    boost::detail::spinlock  callsMutex;
    option_map_t             puts;
    boost::detail::spinlock  putsMutex;
  };
  typedef struct Family family_t;

The leaf node of the component will have one of these for those symbols that are exactly three characters long, and a map of these for those that are longer.

One level up, we'll have an array of these leaf nodes - one for each possible encoded third character in the symbol name, and a Family for those symbols that end in two characters.

We keep going like this and at the top level we have an array of these "trees" - one per message type. In this way, we make it almost impossible to have a lock contention as we're forking the data very early, and once past a certain level, you have the pointer for the current level, and that's not going to change on you.

Tomorrow morning I'll finish off the last bits and start the testing. I hope this thing is going to give me the kind of speed I need.

Google Chrome dev 8.0.552.18 is Out

Wednesday, October 27th, 2010

GoogleChrome.jpg

I saw that Google Chrome dev 8.0.552.18 was released, and it appears that it's just a round of stability and polish features. The release notes are pretty sparse:

The Chrome Dev channel has been updated to 8.0.552.18 for all platforms. This release addresses a number of stability and polish issues found in the previous release. It also has moved the work on 3D acceleration back to about:flags.

now I like that the about:flags reports the GPU-based acceleration, as I've been wondering if it's active for my build, so that's nice, but the rest is really more polish. Nice, but not earth-shaking new features.