Building A Better Map

Professor.jpg

When I got the 128-bit integer done, and the string-based security key put into my ticker plant I was not totally surprised to see that the performance was bad. I mean really bad. It's not something to hack a finger off for, but it was bad. So I knew I had to come up with something to get rid of the need for the 128-bit integers and keep the speed of the processing up.

Along those lines, there were several other things I could do, and still need to do to the ticker plant and how it parses the incoming data from the exchange. I need to handle parsing of the data a lot faster, as it's getting to the point that I've got it right, now I need to get it fast.

Still... the biggie is the map of messages for each instrument. It was the reason for the 128-bit integer, and it needs to be handled. So I'm trying a trie - but only to a point. I looked at the data, and if we encode A-Z, a-z, 0-9, and a little punctuation, we can do it all in 47 values. So if I map the ASCII symbol name with this scheme, I can have 47 bins for the first character, 47 for the second, and 47 for the third - 473 = 103,823 entries. At a minimum of a pointer, that's still under 1MB for everything in three characters. If I add a map at the 'leaf node' for those symbols in excess of three characters, it'll slow down a bit, but the locks will be almost non-conflicting, and things should be pretty speedy.

The core of the map will be a Family where we'll hold the message for the underlying along with a map of the call options and another for the put options. These maps will be keyed on the expiration date and strike, and will be protected by spinlocks. But since they are only for these maps, the likelihood that we'll encounter a lock contention is very small. It'll still be very fast.

  struct Family {
    Message                  *stock;
    option_map_t             calls;
    boost::detail::spinlock  callsMutex;
    option_map_t             puts;
    boost::detail::spinlock  putsMutex;
  };
  typedef struct Family family_t;

The leaf node of the component will have one of these for those symbols that are exactly three characters long, and a map of these for those that are longer.

One level up, we'll have an array of these leaf nodes - one for each possible encoded third character in the symbol name, and a Family for those symbols that end in two characters.

We keep going like this and at the top level we have an array of these "trees" - one per message type. In this way, we make it almost impossible to have a lock contention as we're forking the data very early, and once past a certain level, you have the pointer for the current level, and that's not going to change on you.

Tomorrow morning I'll finish off the last bits and start the testing. I hope this thing is going to give me the kind of speed I need.