Archive for the ‘Coding’ Category

Struggling With Efficient Exchange Data Decoding

Wednesday, August 4th, 2010

Today I spent a lot of time trying to come up with a really nice way to parse the exchange data. It's not as simple as it seems. I should say that it's not simple to make something that doesn't take hundreds of lines of code comprised of structs, if/then/else and switch statements.

Typically, you're going to get a message with a fixed header. In that header, the full "type" of the message will be encoded. Then you lay another struct on the message and it reveals the other fields you can pick off. With each of these messages, you could be looking at from four to ten different "patterns" and therefore structs. This makes for a lot of code to really parse out the data.

Couple that with the switch statements to know what struct to apply, and the code gets very large, very fast. One upside to this scheme is that the execution of this code is very fast. So... I wanted something that was just as fast, but was far more compact in the 'lines of code' category.

What I decided to try was a "tagging" scheme where I don't attempt to make complete sense of all the data in the exchange record, but simply indicate where each value is, and how to decode it. For example, if I create the struct:

  typedef struct {
    char       type;
    uint16_t   pos;
    uint16_t   len;
  } variable_tag_t;

where the fields are type, position and length, then I can create an array of tags that indicates where the values I need are located in the data stream:

  static variable_tag_t[]  tradeTags = {
    { 'L', 0, 10 },
    { 'S', 10, 15 },
    { 'D', 25, 8 }
  };

and I can read this: I have a long int at position 0 that's 10 characters long, a string starting at position 10 for 15 characters, and a double at position 25 for 8. It's not bad, and I can see a lot of good with this idea.

I can create constructors on the messages that take one of these arrays and the data from the exchange and use it to extract the ivars from the data. This way, the order and count of tags is fixed by the message's needs, but the location and size of each value is dictated by the specifics of the exchange.

It seems like a decent idea, but I'm going to have to make several more messages and look at a few more exchange message definitions to make sure that it's really going to work out. I certainly like the compactness of the scheme. Some will argue that it's all hard-coded numbers, and that's not good - but how much different is this than a bunch of structs? Either is a hard-coded definition of the data organization in the exchange data. This just happens to be more direct.

We'll have to see how things work out.

Building Up my C++ Variant Class

Tuesday, August 3rd, 2010

Today I spent quite a bit of time really fleshing out my variant class today. I needed to have a lot more functionality in the code as it was going to be an integral part of the ticker plant I'm working on. I needed to write a bunch of tests, and each new test uncovered either a compiler issue - like needing a new version of a method for a different use case, or a real bug in the code, which I had to fix.

Overall, it was a pretty good day, but it was all spent making the class a lot more useful to the developer that would be using it. Which, of course, is me.

Scrapped Boost Variant – Wrote My Own

Monday, August 2nd, 2010

Boost C++ Libraries

Today I messed around with the boost::variant problem I'd been dealing with lately trying to get the code to compile and work properly. Finally, after about five hours, I gave up. It's simply too hard to get working, and even if I did, the maintenance costs of dealing with these kinds of compiler errors would be far too high for a junior developer.

So I took a very different track: I simply wrote my own. Honestly, it wasn't all that hard. I took out the definition of the boost::variant, and in it's place I put a simple union:

  private:
    tVariantType      mType;
    union {
      std::map<std::string, variant>    *mMapValue;
      std::list<variant>                *mListValue;
      std::string                       *mStringValue;
      int64_t                           mIntValue;
      double                            mDoubleValue;
      uuid_t                            *mUUIDValue;
      bool                              mBoolValue;
      error_t                           *mErrorValue;
    };

and then all the setters cleared out the old value and replaced it with the new. It's something I've written before, and so I knew a lot of the pitfalls to avoid. But it's not as nice as a stack-based template version. When I change values I'm hitting the heap for new space. While this isn't horrible for a lot of applications, it kills performance when you're trying to do something really fast.

Still... this is far easier to understand, and once we have it all buttoned-up, there's no real chance of a leak, and it's a solid way to handle the variant problem.

Once I got the main part written, I was able to attack the serialization and de-serialization schemes for this guy - based on the work of another group who has defined the scheme we'll be using. It's decently flexible, and should be really nice to use across the board.

Still lots of testing to do, but I'll get to that tomorrow.

Google Chrome dev 6.0.472.14 is Out

Monday, August 2nd, 2010

This morning I noticed that Google Chrome dev 6.0.472.14 was released with the exact same release notes as 6.0.472.11 - "UI tweaks and no Flash loading". OK, I can see it's not a really exciting time in the Chrome group, but I wonder why they are doing all these releases and not fixing the Flash loading. Probably won't ever know...

Trying to Get Boost Variant Working

Friday, July 30th, 2010

Boost C++ Libraries

Once more unto the boost... this time to try and get the boost::variant working. In my particular application, I've got a self-defined data stream that can include:

  • Map - all keys are up to 256-character strings, values are variants
  • List - all elements are variants
  • Integer
  • Double
  • String
  • Boolean
  • NULL
  • UUID
  • Date
  • Error - which is a boost::tuple of a UUID, an integer, and a variant

and with the recursion in the definition with the map and list, I wanted to try the boost make_recursive_variant capabilities.

I sure do wish boost had better docs, because even getting to the point that the code compiles was an all-day affair. Primarily due to two lines of code:

  void variant::set( std::map<std::string, variant> & aMap )
  {
    mValue = aMap;
  }

and:

  void variant::set( std::list<variant> & aList )
  {
    mValue = aList;
  }

In theory, and in practice, this should set the value of the ivar mValue, the boost::variant, to the map and list, respectively. But I got the most insane compiler errors I've ever seen. Oddly enough, when I do:

  void variant::set( std::string & aValue )
  {
    mValue = aValue;
  }

everything works just fine. So it's clearly something about the recursive definition in the code. Possibly in the const-ness of one of something, but I tried all possible permutations I could think of. Nothing worked.

Finally, I tried this:

  void variant::set( std::map<std::string, variant> & aMap )
  {
    mValue.get< std::map<std::string, variant> > = aMap;
  }

and it compiled, but when I called this code and the value in the variant was not already a map, this threw a "bad cast" boost exception. Very understandable... I'm saying give me the existing map, and then set this guy there, but there's no existing map.

Exceptionally frustrating, but I'll have to hit it again on Monday.

Coda Notes is Out

Thursday, July 29th, 2010

CodaNotes

One of the interesting demos at WWDC 2010 was in the Safari Extensions talk where Cabel of Panic fame showed something they hacked up called Coda Notes. It's a way to annotate a web page in Safari and then email it to your support staff, etc., and they can see all your annotations. This is super helpful for the web developer, and while I'm not sure I'll use it a lot, I certainly want to keep a handle on it. It looks just too amazingly powerful.

Awesome work.

Google Chrome dev 6.0.472.11 is Out

Thursday, July 29th, 2010

GoogleChrome.jpg

This morning I noticed that Google Chrome dev 6.0.472.11 was out with a very small set of release notes:

This release contains

  • UI tweaks and clean up
  • Additional stability fixes

Known Issues

I'm glad I'm not trying to use PDFs with Chrome. In any case, it's nice to stay up to date.

Amazing Use of calloc in The Magic Schoolbus

Wednesday, July 28th, 2010

Crazy Lemon the Coder

I ran across this today and I simply could not believe what I was seeing. It's right up there on Daily WTF - or should be, anyway. First, a little set-up...

This code is part of an incoming exchange data decoder. The Exchange will send messages on udp multicast, and it's up to us to grab them, decode them, place them in out message formats, and pass them on to all waiting listeners. What's important to realize is that these decoders are supposed to be efficient and fast. After all, they are decoding hundreds of thousands of messages a second. It's a lot of data.

So... the exchange dictates it's message format, and as in the olden days of the mainframe, most all the data is in fixed-length ASCII records. Specifically, the integer for the size of an order might be 8 characters and look like this:

  120.....

where the '.'s are spaces. Eight characters total, in ASCII format for the number 120. Simple. Not very efficient, but simple.

Since these fixed-length records will be end-to-end, there's no terminating NULL characters to make it easier to parse - you have to know what you're looking for. Well, the code I saw started with this method:

  /*
   * Convert unsafe char array string to int.
   */
  inline int uatoi(const char *nptr, size_t l)
  {
    int rv = 0;
    char *nullTermStr = cmalloc2(l + 1);
    if (nullTermStr == NULL) {
      errno = ENOMEM;
      return INT_MAX;
    }
 
    memcpy(nullTermStr, nptr, l);
    rv = atoi(nullTermStr);
    free(nullTermStr);
 
    return rv;
  }

where:

  /*
   * Malloc2 that returns a char pointer.
   */
  inline char *cmalloc2(size_t size)
  {
    return (char *) malloc2(size);
  }

and:

  /*
   * Malloc memory and initialize it to zero
   */
  inline void *malloc2(size_t size)
  {
    if (size < 0)
      return NULL;
 
    void *ptr = calloc(size, 1);
 
    return ptr;
  }

OK... this is really quite stunning. You want to parse a (char *) and so you duplicate it, by calling a useless method and then calloc with a repeat count of 1, parse it and then free the memory. And this is fast? For upwards of 10 fields a message, hundreds of thousands of times a second?

When I re-wrote the functionality I was decidedly simpler:

  /*
   * Convert unsafe char array string to int.
   */
  inline int uatoi( char *nptr, size_t width )
  {
    char    hold = nptr[width];
    nptr[width] = '\0';
    int     retval = atoi(nptr);
    nptr[width] = hold;
    return retval;
  }

Sure, I had to "loose" the const in the signature because I was modifying the data as I parsed it, but hey - it's a message from some data source - that's OK. It's also the same "logic" of using atoi() in the decoding. But now I'm not calling something to create some memory and then copying it, and destroying it. I can't believe they didn't look at all this before. It's incredible!

I know there comes a time when people don't look at the code anymore and just think the whole thing is too complicated... but really... guys... let's try a little harder. This is a horrible performance penalty for parsing. It should have been looked at long ago.

I guess I'm the one that decided to really look at it.

BBEdit 9.5.1 is Out

Wednesday, July 28th, 2010

BBEdit.jpg

This morning I noticed that BBEdit 9.5.1 was released and while it didn't fix the one problem I had with it, it did seem to fix a lot of bugs other people had with it.

My problem is that when you open a C++ header file, you can, with a click of a widget on the window, open up the corresponding implementation file. Previously, this allowed you to open it up in the same window, but now it seems to respect the global defaults of opening it up in a new window. In general, I want new files in new windows, but when it's the header/implementation file pair, it's a different story.

It'd be easy to add, I'm sure, but I'm guessing they aren't going to change it back. I've sent them an email once, I'll send it to them again this morning.

UPDATE: I sent along the description of the issue. We'll see if they respond. I'd really like to have that feature back.

[7/29] UPDATE: they don't have this, but they do have something that's almost as good: If you set the default to "Open in Front Window" you can get the file opening up where you need it. Problem is everything will be opening there. The way around that is the View menu: "Move to New Window". I put in the keyboard shortcut command-option-O. Now if there's a file you want "split out", it's a single keystroke. Not bad. Not what I wanted, but it's not bad.

Upgraded to Git 1.7.2 on MacBook Pro – Pro Git Tips

Wednesday, July 28th, 2010

gitLogo.gif

I saw a nice tweet from GitHub this morning about some really nice pro git tips, and at the top of the page, it points out that some of these features require git 1.7.2. That got me thinking about what version I was on because I know I updated a little while ago. So again, I did the magic:

  $ git --version
  git version 1.7.0.3

and it looked like I needed to spend a few minutes getting up to date.

I'm a fan of the Mac OS X Git installer on Google Code as it's a clean package to download and get installed.

Took only a few minutes and now:

  $ git --version
  git version 1.7.2

Perfect!

I'll need to get this installed on my other boxes in my office tonight.