Archive for July, 2010

Trying to Get Boost Variant Working

Friday, July 30th, 2010

Boost C++ Libraries

Once more unto the boost... this time to try and get the boost::variant working. In my particular application, I've got a self-defined data stream that can include:

  • Map - all keys are up to 256-character strings, values are variants
  • List - all elements are variants
  • Integer
  • Double
  • String
  • Boolean
  • NULL
  • UUID
  • Date
  • Error - which is a boost::tuple of a UUID, an integer, and a variant

and with the recursion in the definition with the map and list, I wanted to try the boost make_recursive_variant capabilities.

I sure do wish boost had better docs, because even getting to the point that the code compiles was an all-day affair. Primarily due to two lines of code:

  void variant::set( std::map<std::string, variant> & aMap )
  {
    mValue = aMap;
  }

and:

  void variant::set( std::list<variant> & aList )
  {
    mValue = aList;
  }

In theory, and in practice, this should set the value of the ivar mValue, the boost::variant, to the map and list, respectively. But I got the most insane compiler errors I've ever seen. Oddly enough, when I do:

  void variant::set( std::string & aValue )
  {
    mValue = aValue;
  }

everything works just fine. So it's clearly something about the recursive definition in the code. Possibly in the const-ness of one of something, but I tried all possible permutations I could think of. Nothing worked.

Finally, I tried this:

  void variant::set( std::map<std::string, variant> & aMap )
  {
    mValue.get< std::map<std::string, variant> > = aMap;
  }

and it compiled, but when I called this code and the value in the variant was not already a map, this threw a "bad cast" boost exception. Very understandable... I'm saying give me the existing map, and then set this guy there, but there's no existing map.

Exceptionally frustrating, but I'll have to hit it again on Monday.

Coda Notes is Out

Thursday, July 29th, 2010

CodaNotes

One of the interesting demos at WWDC 2010 was in the Safari Extensions talk where Cabel of Panic fame showed something they hacked up called Coda Notes. It's a way to annotate a web page in Safari and then email it to your support staff, etc., and they can see all your annotations. This is super helpful for the web developer, and while I'm not sure I'll use it a lot, I certainly want to keep a handle on it. It looks just too amazingly powerful.

Awesome work.

Safari 5.0.1 is on Software Updates

Thursday, July 29th, 2010

This morning I saw that Safari 5.0.1 is out on Software Updates and even though it's got just a few little security patches, it's nice to stay clear of the bad people out there bent on doing harm.

Google Chrome dev 6.0.472.11 is Out

Thursday, July 29th, 2010

GoogleChrome.jpg

This morning I noticed that Google Chrome dev 6.0.472.11 was out with a very small set of release notes:

This release contains

  • UI tweaks and clean up
  • Additional stability fixes

Known Issues

I'm glad I'm not trying to use PDFs with Chrome. In any case, it's nice to stay up to date.

Amazing Use of calloc in The Magic Schoolbus

Wednesday, July 28th, 2010

Crazy Lemon the Coder

I ran across this today and I simply could not believe what I was seeing. It's right up there on Daily WTF - or should be, anyway. First, a little set-up...

This code is part of an incoming exchange data decoder. The Exchange will send messages on udp multicast, and it's up to us to grab them, decode them, place them in out message formats, and pass them on to all waiting listeners. What's important to realize is that these decoders are supposed to be efficient and fast. After all, they are decoding hundreds of thousands of messages a second. It's a lot of data.

So... the exchange dictates it's message format, and as in the olden days of the mainframe, most all the data is in fixed-length ASCII records. Specifically, the integer for the size of an order might be 8 characters and look like this:

  120.....

where the '.'s are spaces. Eight characters total, in ASCII format for the number 120. Simple. Not very efficient, but simple.

Since these fixed-length records will be end-to-end, there's no terminating NULL characters to make it easier to parse - you have to know what you're looking for. Well, the code I saw started with this method:

  /*
   * Convert unsafe char array string to int.
   */
  inline int uatoi(const char *nptr, size_t l)
  {
    int rv = 0;
    char *nullTermStr = cmalloc2(l + 1);
    if (nullTermStr == NULL) {
      errno = ENOMEM;
      return INT_MAX;
    }
 
    memcpy(nullTermStr, nptr, l);
    rv = atoi(nullTermStr);
    free(nullTermStr);
 
    return rv;
  }

where:

  /*
   * Malloc2 that returns a char pointer.
   */
  inline char *cmalloc2(size_t size)
  {
    return (char *) malloc2(size);
  }

and:

  /*
   * Malloc memory and initialize it to zero
   */
  inline void *malloc2(size_t size)
  {
    if (size < 0)
      return NULL;
 
    void *ptr = calloc(size, 1);
 
    return ptr;
  }

OK... this is really quite stunning. You want to parse a (char *) and so you duplicate it, by calling a useless method and then calloc with a repeat count of 1, parse it and then free the memory. And this is fast? For upwards of 10 fields a message, hundreds of thousands of times a second?

When I re-wrote the functionality I was decidedly simpler:

  /*
   * Convert unsafe char array string to int.
   */
  inline int uatoi( char *nptr, size_t width )
  {
    char    hold = nptr[width];
    nptr[width] = '\0';
    int     retval = atoi(nptr);
    nptr[width] = hold;
    return retval;
  }

Sure, I had to "loose" the const in the signature because I was modifying the data as I parsed it, but hey - it's a message from some data source - that's OK. It's also the same "logic" of using atoi() in the decoding. But now I'm not calling something to create some memory and then copying it, and destroying it. I can't believe they didn't look at all this before. It's incredible!

I know there comes a time when people don't look at the code anymore and just think the whole thing is too complicated... but really... guys... let's try a little harder. This is a horrible performance penalty for parsing. It should have been looked at long ago.

I guess I'm the one that decided to really look at it.

New Trackpad Drivers from Apple for Recent MacBook Pros

Wednesday, July 28th, 2010

MacBookPro17.jpg

This morning Apple dropped new drivers for the new Magic Trackpad on Software Updates, and as a consequence, added inertial scrolling to the existing MacBook Pro users - like me. I installed it primarily because I'm a guy that believes in staying up to date, but when I started to use the inertial scrolling I was simply blown away.

This is what I love about the iPhone, and now it's on my Mac. This makes it so much less effort to scroll around things. Very nice. Very slick.

There is only one Apple. Long live the King.

BBEdit 9.5.1 is Out

Wednesday, July 28th, 2010

BBEdit.jpg

This morning I noticed that BBEdit 9.5.1 was released and while it didn't fix the one problem I had with it, it did seem to fix a lot of bugs other people had with it.

My problem is that when you open a C++ header file, you can, with a click of a widget on the window, open up the corresponding implementation file. Previously, this allowed you to open it up in the same window, but now it seems to respect the global defaults of opening it up in a new window. In general, I want new files in new windows, but when it's the header/implementation file pair, it's a different story.

It'd be easy to add, I'm sure, but I'm guessing they aren't going to change it back. I've sent them an email once, I'll send it to them again this morning.

UPDATE: I sent along the description of the issue. We'll see if they respond. I'd really like to have that feature back.

[7/29] UPDATE: they don't have this, but they do have something that's almost as good: If you set the default to "Open in Front Window" you can get the file opening up where you need it. Problem is everything will be opening there. The way around that is the View menu: "Move to New Window". I put in the keyboard shortcut command-option-O. Now if there's a file you want "split out", it's a single keystroke. Not bad. Not what I wanted, but it's not bad.

Upgraded to Git 1.7.2 on MacBook Pro – Pro Git Tips

Wednesday, July 28th, 2010

gitLogo.gif

I saw a nice tweet from GitHub this morning about some really nice pro git tips, and at the top of the page, it points out that some of these features require git 1.7.2. That got me thinking about what version I was on because I know I updated a little while ago. So again, I did the magic:

  $ git --version
  git version 1.7.0.3

and it looked like I needed to spend a few minutes getting up to date.

I'm a fan of the Mac OS X Git installer on Google Code as it's a clean package to download and get installed.

Took only a few minutes and now:

  $ git --version
  git version 1.7.2

Perfect!

I'll need to get this installed on my other boxes in my office tonight.

Cool Method for Milliseconds Since Midnight for C/C++

Tuesday, July 27th, 2010

cplusplus.jpg

I was working with exchange data today, and realized that the "timestamp in microseconds since epoch" wasn't really a great timestamp, and, in fact, the exchanges are using the reference point of midnight, and only to milliseconds. The problem was, I didn't have a simple method that could give me the time with respect to midnight. I didn't see anything that was really helpful on google either.

Then it hit me... It was like a flash - like all great insights are: use the seconds in the response from gettimeofday() as the input to localtime_r() and then you have one time "instant" defined, but you can pick off the hours, minutes, and secons and then add in the milliseconds by dividing the microseconds.

Like this:

  uint32_t TransferStats::msecSinceMidnight()
  {
    /*
     * This is really interesting. I need to get the msec since midnight,
     * and the cleanest way I could think to do this was to get the
     * timeval struct and then take the tv_sec component of it and pass
     * it to localtime_r to get the hour, min, sec parts of the time and
     * then piece it all together again. Kinda slick.
     */
    // get the time now as the (sec + usec) since epoch
    struct timeval tv;
    gettimeofday(&tv, NULL);
    // now take the seconds component and get the current hour, min, sec
    struct tm   now;
    localtime_r(&tv.tv_sec, &now);
    // now let's calculate the time since midnight...
    return (tv.tv_usec/1000 + ((now.tm_hour * 60 + now.tm_min) * 60
                                + now.tm_sec) * 1000);
  }

Fleshing Out a few Concrete Messages for My Ticker Plant

Monday, July 26th, 2010

Today I spent most of the day trying to flesh out the infrastructure for a few concrete message classes of exchange data in my new ticker plant. The initial work I'd been doing was focused on simple message types, and just about anything would work there - so I used my simple 'Hello' and 'GoodBye' TCP handshaking messages, and they worked fine. But now, I've got people wanting to test components that require the sequence number from exchange messages, and that means I needed a lot more infrastructure than I had built up.

The first thing was the idea of the conflation, or 'compression', key on the messages. This is something that I know I'm going to need in the client code as the current system has historically had a serious problem of not being able to handle a slow consumer very well at all. I settled on a simple uint64_t as the conflation key type and added a very nice little string hash method to get decently diverse integer values from strings:

  static const size_t    __initialFNV = 2166136261U;
  static const size_t    __fnvMultiple = 16777619;
  size_t Message::hash( const std::string & aString ) const
  {
    size_t   hash = __initialFNV;
    size_t   len = aString.length();
    const char *p = aString.data();
    for (size_t i = 0; i < len; ++i) {
      hash = hash ^ (*(p++));
      hash *= __fnvMultiple;
    }
    return hash;
  }

and from what I've read, this Fowler, Noll, Vo algorithm is pretty decent at creating a diverse mapping of strings into the integer space. It's not something I'm going to count on as unique, but it's decent enough for some message types.

Now that I had something to put into the getConflationKey() for my existing messages, I started with the exchange messages. I needed to make a base message for exchange data, and then a faux price message that I could use for testing. This is going to be something that is very close to a "real" message, but will be strictly used for testing.

Why? Because I can control everything about it and it'll never change. This is the kind of test framework we need for the performance testing and comparison runs. True, we'll also need real-world exchange data for some additional testing, but that doesn't negate the value of this kind of test framework.

Then I realized that I needed to make a final determination on how to integrate the "parsing" of the external data into msg::Message instances. I decided to go with a very simplistic API. The MessageFactory has three basic methods - the create() method to take external data into our messages, the extractSequenceNumber() to extract just the sequence number from said external data, and the masquerade() method to be the inverse of the create() method and attempt to make external data from the message.

I then had to put this all in place and write the faux price message and it's data conversion class. It was a lot of coding and I didn't quite get it all done today, but should be able to finish it up tomorrow.