Multiple-Producer/Single-Consumer Lockless FIFO Queue

Professor.jpg

I've been working on trying to get more speed out of the components of my ticker plants as they run fine during the day, but the crush at the open is still something that gets ahead of them. To that end, I wanted to make my conflation queue lockless. The trie that holds the messages is lockless - using just the GCC built-ins for compare-and-swap. But the STL std::deque was certainly not thread-safe, and updating the two together was something I left in the lock as well.

Then I got to thinking about the order of operations in the conflation queue, and I became convinced that I could order things properly and it would be possible t have a lockless conflation queue if I had a lockless FIFO queue that allowed for multiple-producers. As is is now, the only lockless FIFO queue I have is a nice, fixed-size, circular FIFO that doesn't even use compare-and-swap. But that wasn't going to be good enough.

So I started writing. My goal: lockless, multiple-producer/single-consumer, FIFO queue using compare-and-swap. It's all sitting in this gist.

There are a few things I'm really pleased about this class. First, it's a nice, general template class, and the problem with my circular FIFO class is that it uses the volatile keyword to synchronize data between the CPUs. This is nice, as it doesn't use any compare-and-swap operations, but it makes it very hard to put your own classes into the queue. It's best suited for simple values like pointers, etc. Nice and useful, but to have something like this is even better.

The one section of code that I wish were a little better is in the pop() call:

  1. /**
  2.   * This method updates the passed-in reference with the value on the
  3.   * top of the queue - if it can. If so, it'll return the value and
  4.   * 'true', but if it can't, as in the queue is empty, then the method
  5.   * will return 'false' and the value will be untouched.
  6.   */
  7. bool pop( T & anElem )
  8. {
  9. bool error = false;
  10.  
  11. // see if the head is NULL - i.e. empty queue
  12. if (__sync_bool_compare_and_swap(&mHead, NULL, mHead)) {
  13. error = true;
  14. }
  15.  
  16. /**
  17.   * This is interesting... when we pop, we need to see if the
  18.   * head and tail are pointing to the same thing - the only
  19.   * element in the queue. If they are, then the best thing to
  20.   * do is to NULL-out the tail, so push() can work, and then
  21.   * quickly swap out the head so it's NULL as well. Then we
  22.   * have time to do what we need.
  23.   *
  24.   * However, if there are several elements in the queue, then
  25.   * we only need to remove the head and process it.
  26.   */
  27. if (!error) {
  28. // get what I *think* is the head (should be)
  29. Node *h = mHead;
  30. /**
  31.   * If we have the head == tail, then we have only one item
  32.   * in the list. In order to make this a clean break, NULL
  33.   * out the tail so new push() calls can start building the
  34.   * list again.
  35.   */
  36. __sync_bool_compare_and_swap(&mTail, h, NULL);
  37. // now let's move the head to the next in the list
  38. while (!__sync_bool_compare_and_swap(&mHead, h, mHead->next)) {
  39. h = mHead;
  40. }
  41.  
  42. // get the value out of the node and into the arg
  43. if (h != NULL) {
  44. anElem = h->value;
  45. // at this point we're done with this node - delete it
  46. delete h;
  47. h = NULL;
  48. } else {
  49. error = true;
  50. }
  51. }
  52.  
  53. return !error;
  54. }

In lines 29-40 we are trying to deal with the fact that there is one, and only one, value in the queue. This means that the head and the tail point to the same Node. What I wanted to do was to NULL out the head and tail, but save the Node for later extraction. Since I can't atomically swap two variables, I had to do the one that I believe is more critical - the tail, as that's effecting the push() calls. I then very quickly NULL out the head and grab that value for processing.

I may come back to this soon, but for now all my tests look good and I'm very happy with the results. I'm able to have a lockless conflation queue which tests out very nicely, and with this as a general component, I have a lot more flexibility in what I use in the tighter performance sections of the code.

Good day's work.

[2/14] UPDATE: I've been doing a lot of reading on queues like this and it turns out that if you start with an 'empty' Node, the code for push() and pop() gets a lot easier. The complete class is still here, but this is the new push():

    /**
     * This method takes an item and places it in the queue - if it can.
     * If so, then it will return 'true', otherwise, it'll return 'false'.
     */
    bool push( const T & anElem )
    {
      bool    error = false;
 
      // create a new Node with the provided value
      Node   *me = new Node(anElem);
      if (me == NULL) {
        error = true;
      } else {
        /**
         * We need to add the new value to the tail and then link it
         * back into the list. Not too bad.
         */
        // put in the new tail, and get the old one
        Node  *oldTail = mTail;
        while (!__sync_bool_compare_and_swap(&mTail, oldTail, me)) {
          oldTail = mTail;
        }
        // OK, make sure that the list remains intact
        if (oldTail != NULL) {
          oldTail->next = me;
        }
      }
 
      return !error;
    }

and pop() becomes:

    /**
     * This method updates the passed-in reference with the value on the
     * top of the queue - if it can. If so, it'll return the value and
     * 'true', but if it can't, as in the queue is empty, then the method
     * will return 'false' and the value will be untouched.
     */
    bool pop( T & anElem )
    {
      bool    error = false;
 
      // if the next guy is NULL, we're empty
      if (__sync_bool_compare_and_swap(&(mHead->next), NULL, NULL)) {
        error = true;
      } else {
        // move the head to the next Node and get the value
        Node  *oldHead = __sync_val_compare_and_swap(&mHead, mHead, mHead->next);
        anElem = mHead->value;
        // if there is an oldHead (should be), delete it now
        if (oldHead != NULL) {
          delete oldHead;
        }
      }
 
      return !error;
    }

Of important note is that we're now only touching the mTail in the push() method, and only touching the mHead in the pop(). This is much cleaner than the previous implementation, and in addition, there are no more "spin to replace" loops as we don't care in this scheme. Much cleaner.