Struggling With Efficient Exchange Data Decoding

Today I spent a lot of time trying to come up with a really nice way to parse the exchange data. It's not as simple as it seems. I should say that it's not simple to make something that doesn't take hundreds of lines of code comprised of structs, if/then/else and switch statements.

Typically, you're going to get a message with a fixed header. In that header, the full "type" of the message will be encoded. Then you lay another struct on the message and it reveals the other fields you can pick off. With each of these messages, you could be looking at from four to ten different "patterns" and therefore structs. This makes for a lot of code to really parse out the data.

Couple that with the switch statements to know what struct to apply, and the code gets very large, very fast. One upside to this scheme is that the execution of this code is very fast. So... I wanted something that was just as fast, but was far more compact in the 'lines of code' category.

What I decided to try was a "tagging" scheme where I don't attempt to make complete sense of all the data in the exchange record, but simply indicate where each value is, and how to decode it. For example, if I create the struct:

  typedef struct {
    char       type;
    uint16_t   pos;
    uint16_t   len;
  } variable_tag_t;

where the fields are type, position and length, then I can create an array of tags that indicates where the values I need are located in the data stream:

  static variable_tag_t[]  tradeTags = {
    { 'L', 0, 10 },
    { 'S', 10, 15 },
    { 'D', 25, 8 }
  };

and I can read this: I have a long int at position 0 that's 10 characters long, a string starting at position 10 for 15 characters, and a double at position 25 for 8. It's not bad, and I can see a lot of good with this idea.

I can create constructors on the messages that take one of these arrays and the data from the exchange and use it to extract the ivars from the data. This way, the order and count of tags is fixed by the message's needs, but the location and size of each value is dictated by the specifics of the exchange.

It seems like a decent idea, but I'm going to have to make several more messages and look at a few more exchange message definitions to make sure that it's really going to work out. I certainly like the compactness of the scheme. Some will argue that it's all hard-coded numbers, and that's not good - but how much different is this than a bunch of structs? Either is a hard-coded definition of the data organization in the exchange data. This just happens to be more direct.

We'll have to see how things work out.