Using ZeroMQ’s Zero Copy Message Constructor

ZeroMQ

This morning I was looking at some code that was left over from last week, and was causing a lot of problems with the CentOS build of my ticker plants. The problem was in the zmq::message_t's data() method call, and I was trying to dig into the ZeroMQ code to see what was happening there that might be causing the problems on CentOS and not on Ubuntu.

Nothing really popped out. But at the same time, my existing transmitter code looks something like this:

  try {
    // make a ZMQ message of the right size
    zmq::message_t  msg(aPayload.size());
    // ...copy in the data we need from the payload
    memcpy(msg.data(), aPayload.data(), aPayload.size());
    // ...and WOOSH! out it goes
    aTopic.socket->send(msg);
  } catch (std::exception & e) {
  }

It's a design rule that you need to create a new ZMQ message for each send, and while it's possible to copy messages, I hadn't really dug into it that much up to now. But this morning I decided to see if I could use the zero copy version of the message constructor to minimize the creations on the heap and therefore bring a little stability to the CentOS build.

It turns out to be pretty easy. In my code, I've got one 'buffer' - a std::string that I clear out every message send and fill with the serialized form of the outgoing message. I'm already passing this into the sending method as aPayload, and copying the data out of it into the new message.

For the zero copy to work, we need to have a function that can be called to "free" the memory passed into the ZMQ message's constructor. Since I've got this one buffer, and I'm reusing it over and over again, all this "free" method needed to do was exactly nothing. I just needed to satisfy the ZeroMQ contract.

So I made a static do-nothing method:

  static void payloadRecycle( void *data, void *hint );

and then changed my code to look like:

  try {
    // make a ZMQ message with the payload's data
    zmq::message_t  msg((void*)aPayload.data(), aPayload.size(), payloadRecycle);
    // ...and WOOSH! out it goes
    aTopic.socket->send(msg);
  } catch (std::exception & e) {
  }

At this point, the zmq::message_t is going to create it's structures around this buffer, send out the message, and then call my "free" function. I'll
do nothing, and then we'll return to the loop where I clear out this buffer and do it all again. Very slick!

Unfortunately, this didn't help the CentOS build as the messages appeared to be coming from this call, but they didn't stop. Ubuntu, however, is running strong. Can't figure that out, but I'm switching to Ubuntu anyway, and so it really doesn't matter.

Still... I got rid of the buffer creation and copy, and that's got to help the performace a little.

UPDATE: I had a problem on the receiver side of things. It seems that when I do the zero-copy, ZMQ misses the size of the payload by one or two bytes either side of the actual payload. When I use the memcpy() code, it's just fine. Sounds like a bug in the ZMQ code to me. I'll see if I can find out if they have extensively tested that code.

[9:22] UPDATE: the problem is more subtle - the fact is that the ZMQ method, send(), is not atomic. It will return as soon as it can, and buffer the message to be sent. I, on the other hand, assumed that it was, and so I was refilling the buffer with the next message. Not good. The problem could be solved with a string buffer pool, but then we're copying into that, and the message still needs to be constructed.

I suppose that if I put the string pool in the "filling" part, it'd work as we're only going to fill one string, and that would save a copy. But for now, I know why it's not working, and I'm OK with the old code.

At least I know what's happening and what it takes to really fix it.