Adding a Flexible Sized Key Templated Trie to DKit

DKit Laboratory

Today I spent a good chunk of time writing a 64-bit key trie as the next basic component of re-writing the exchange feeds in a lot better component library than I had previously written. The purpose of the trie is to have a lockless storage mechanism that has very fast insertion and access methods. The last version I'd written was based on a specific element being stored - a pointer to a Message. In this version I wanted to be able to store any template type - including pointers, and handle the disposal as needed.

I've made adjustments like this for the dkit::pool where I handled plain-old-datatypes as well as pointers, and disposed of the pointer values properly. I needed to be able to do the same thing for the trie.

The second improvement this time around is to stop trying to make the trie do everything associated with processing of its contents. In the past, I'd had the trie scan the Messages for symbol limits, values, etc. This was very inefficient, as it was all in the trie, and therefore very specific. I needed to move away from this and use functors that could be passed into the trie, and then operated on each of the valid elements in the trie. This is a far better scheme as it allows the user to make any processing on the Nodes in the trie, and this is far more flexible than trying to implement the methods in the trie itself.

The solution to this was to define a class within the templates trie so that I could create functors like this:

  class counter : public dkit::trie<blob *, dkit::uint64_key>::functor
  {
    public:
      counter() : _cnt(0) { }
      virtual ~counter() { }
      virtual bool process(
          volatile dkit::trie<blob *, dkit::uint64_key>::Node & aNode )
      {
        ++_cnt;
        return true;
      }
      uint64_t getCount() { return _cnt; }
    private:
      uint64_t     _cnt;
  };

They are easy to create, and you can apply them to all the elements in the trie with a simple call:

    counter    worker;
    m.apply(worker);

The final improvement was to switch from loop-based indexing into the trie to recursion-based accessing. There are a few reasons for this - but the most is flexibility and code simplicity. When you have a fixed key length, and you use looping to traverse the trie, you have a fixed series of nested for loops. This seems all well and good, but it makes for a very fixed structure and the code isn't all the easy to read.

By going with a better class structure for building the trie, we're able to take advantage of recursion, and then the only parameter to the "depth" of the trie, and the corresponding size of the key, is the number of branches used in the construction of the tree in the trie. If we put the parameter in the creation method then there's a single point where we stop creating branches, and instead make the leaf nodes. From this point, the same code that traverses a 64-bit key trie works for a 128-bit trie.

At least insofar as the movement in the trie.

Once I got it all working, the next thing was to look at the recursion vs. looping design. I wanted to make sure that I had the most performant design. What I found was that the recursion was about 25% faster than the looping structure. I'm guessing it's due to tail-recursion optimization by the compiler, but I'm not positive. I repeated the tests, and the difference is real, that's good enough for me.

Once it was all done, I wondered how hard it might be to make the key size another parameter in the template. Well… since we're using recursion, and therefore, the size of the key space is only used in one space, the solution was pretty simple. Start by defining the different sizes we'll accept:

  namespace dkit {
  enum trie_key_size {
    uint16_key = 2,
    uint32_key = 4,
    uint64_key = 8,
    uint128_key = 16,
  };
  }      // end of namespace dkit

and then we convert this to the test we need in the branch creation:

  enum {
    eLastBranch = (N - 2)
  };

In the code, we then have:

  virtual volatile Node *getOrCreateNodeForKey( const uint8_t aKey[],
                                                uint16_t aStep )
  {
    volatile Node   *n = NULL;
 
    // get the index we're working on (re-used a few times)
    uint8_t     idx = aKey[aStep];
    Component   *curr = __sync_or_and_fetch(&kids[idx], 0x0);
    if (curr == NULL) {
      // create a new Branch or Leaf for this part of the trie
      bool    createBranch = true;
      if (aStep < eLastBranch) {
        curr = new Branch();
      } else {
        curr = new Leaf();
        createBranch = false;
      }
      // throw a runtime exception if we couldn't make it
      if (curr == NULL) {
        if (createBranch) {
          throw std::runtime_error("[Branch::getOrCreateNodeForKey] "
                   "Unable to create new Branch for the trie!");
        } else {
          throw std::runtime_error("[Branch::getOrCreateNodeForKey] "
                   "Unable to create new Leaf for the trie!");
        }
      }
      // see if we can put this new one in the right place
      if (!__sync_bool_compare_and_swap(&kids[idx], NULL, curr)) {
        // someone beat us to it! Delete what we just made...
        delete curr;
        // ...and get what is there now
        curr = __sync_or_and_fetch(&kids[idx], 0x0);
      }
    }
 
    // now pass down to that next branch the request to fill
    if (curr != NULL) {
      n = curr->getOrCreateNodeForKey(aKey, (aStep + 1));
    }
 
    // return what we have dug out of the tree
    return n;
  }

This little test allows us to place the key size as a parameter to the template, and that makes it very easy to make different sized keys. For convenience, I added in the convenience methods to deal with the different sized keys from the contained values. It's not strictly necessary, but it'll make using the template class a lot nicer.