Fantastic Difference in Keys for boost::unordered_map

Boost C++ Libraries

Today I spent most of the day on a few issues - the biggest of which was the problem that a restart lost those Option calculation results that I was working so hard to retain. What I needed to do was to persist them to the redis cache server. No way around it. The problem was, I had nothing written for this, and a few other little testing bugs came up and I had to fix them as well.

The persistence system is modeled after all the other save/load persistence methods I have in the code. I have a save state method, and a load state method, and all I needed to do was to put the save method where I was saving other state data, and put the loader after I'd loaded up the instruments and before I continued on with the initialization of the system. In theory, not too bad.

In practice, I had to do quite a bit of detail work, but that was expected. It's non-trivial to persist a class that's not been fitted for persistence, but it's all straightforward. It just took time. No… the problems were after I had that code working, and I went to test it.

The original data structure I used to hold this Option calculation results was a boost::unordered_map:

  struct Results {
    uint32_t         volatility;
    msg::greeks_t    greeks;
    uint32_t         impVolatility;
    msg::greeks_t    impGreeks;
    // …and more fields like this…
  };
  typedef struct Results results_t;
 
  typedef boost::unordered_map<msg::ng::secID_t, results_t> ResultsMap;

The secID_t is a SecurityID - a 128-bit number that packs all the critical instrument identification information into one nice little package. This makes sense as a key in the map because it's unique, easy to use, and the core of a lot of look-ups in the rest of the system. Sounds great.

When I made the deserialization method for the data coming out of the redis server, it looked very simple:

  bool unpack( const std::string & aBuffer, uint32_t & aPos, ResultsMap & aValue )
  {
    bool       error = false;
 
    // first, let's get the number of pairs in the map
    int32_t   sz = 0;
    if (!msg::ng::unpack(aBuffer, aPos, sz)) {
      error = true;
    } else {
      msg::ng::secID_t  id;
      results_t         res;
      // now, for each pair, read them in and set them
      for (int32_t i = 0; i < sz; ++i) {
        id.extractFrom(aBuffer, aPos);
        res.deserialize(aBuffer, aPos);
        aValue[id] = res;
      }
    }
    return !error;
  }

I get the size of the map, then read in each pair and save it in the map. Simple.

Well… simple - yes. Fast? No.

The initial tests had this at 3 minutes for 88,000 pairs. That's horrible results. I did some profiling and it's in the one line:

        aValue[id] = res;

That makes no sense. Boost's unordered_map is the fastest around. I don't understand what's happening. So I did more digging. The next thing I found was that there wasn't an exposed hash function for boost to use. So I added it:

  namespace msg {
  namespace ng {
  size_t hash_value( const secID_t & anID );
  }
  }

and implement it simply as:

  namespace msg {
  namespace ng {
  size_t hash_value( const secID_t & anID )
  {
    return anID.hash();
  }
  }
  }

I expected this to speed things right up. The actual results? Same slowness. Amazing!

Maybe it's the security ID? Maybe it's be better with a Security Key - a std::string version of the same thing? Maybe boost is a lot more efficient dealing with string keys, even though they aren't more space efficient? Worth a try… we can't leave it at a 3 minute start-up.

So I went to:

  typedef boost::unordered_map<std::string, results_t> ResultsMap;

and then:

  bool unpack( const std::string & aBuffer, uint32_t & aPos, ResultsMap & aValue )
  {
    bool       error = false;
 
    // first, let's get the number of pairs in the map
    int32_t   sz = 0;
    if (!msg::ng::unpack(aBuffer, aPos, sz)) {
      error = true;
    } else {
      std::string  key;
      results_t    res;
      // now, for each pair, read them in and set them
      for (int32_t i = 0; i < sz; ++i) {
        if (msg::ng::unpack(aBuffer, aPos, key)) {
          res.deserialize(aBuffer, aPos);
          aValue[key] = res;
        }
      }
    }
    return !error;
  }

With these changes, the insertion of the values into the map went from 3 minutes to 61 msec! That's a factor of almost 3000x! Wow! OK… now I know. Don't use security IDs as keys in the boost::unordered_map. It's just not even close to fast enough.