Identifying, Sorting, Classifying a Ton of Messages
Today I started the process of trying to consolidate the 300+ messages in The Magic Schoolbus into a few reasonable categories: OPRA messages (tons of them, space is critical, data format very rigid), Price Messages (little looser, but still important and small), and everything else. The remainder of the messages are really suitable for fitting into self-describing message formats like JSON, or more likely BSON, as they are very flexible - have variable number of components, and don't need to get shot around the network all the time.
The Really Wasteful
Take for instance, the Holiday Calendar. This is just like every other Holiday Calendar I've ever seen: give it a date (or default to today, and it'll give you all the trading holidays for the next 'n' months. Very simple data structure. Even simpler when all you're talking about are US Equities and their options - you don't even need to tell it which exchange you're asking about as they are all the same.
But here's what The Magic Schoolbus does: Every minute it will publish a list of all holidays for the next ten years and those that are registered for this data will receive it. Over, and over again. Every minute. The format is pretty simple as well. There's the basic header of the message (far too verbose and general) but the payload of the message looks like this:
struct { uint16_t modifiedBy; // trader ID char today[9]; // YYYYMMDD uint8_t numHolidays; // # of holidays Holidays_NEST holidays_nest[]; } HolidayCalendar;
where Holidays_NEST looks like:
struct { char holidayDate[9]; // YYYYMMDD uint8_t holidayType; // 1=no trading; 2=half day } Holidays_NEST;
Now even if we put aside the problems with this content - like a date that's 9 bytes when 2 would do (as a uint16_t) - in fact, we could compress the entire message to look like this:
struct { uint16_t modifiedBy; // trader ID uint16_t today; // YYYYMMDD uint8_t numHolidays; // # of holidays uint16_t holidays[]; // tYYYYMMDD } HolidayCalendar;
where the 't' is the type of day and the date immediately follows. A simple mask gets us what you need and size comparison (assuming 64-bit pointers) is:
old size = 12 + n * 10 new size = 5 + n * 2
and for a typical year we have, say 7 holidays, and ten years, so n = 70:
old size = 12 + 70 * 10 = 712 new size = 5 + 70 * 2 = 145 savings: 79%
It's just stunning how bad some of these messages are.
The Horrible Congestion
Look again at the Holiday Calendar - it's sending this data out every minute. Why? Because the designers believed that this was the only way the data was going to get delivered to the client. What about a data cache/data service? They even have a cache server in the architecture - but it holds all the messages sent and as such, it's not nearly as efficient as a more customized data service.
So I need to do something here - basically, stop the insanity of sending all this data all the time. I need to have the client get it when it requests it and when it fundamentally changes. This means something a lot more intelligent and flexible than read from the database, make a monster message, send it, repeat.
The Task
It's huge. I have to look at all the used messages and then try to see what can be combined into a nice, compact format for sending at high speed to a lot of clients, and what can be more free-form and possibly even skip the 29West sending in the first place.
It's a monster job. But it's gotta be done. The reason this is in such a horrible state is because no one has taken it upon themselves to do this until now. It's ugly, and it's painful, but it's got to be done.