What's it all about, Alfie?

Tracking Down Nasty Memory Issue – Patience is a Virtue (cont.)

This morning has been very enlightening on ZeroMQ. Very exciting stuff. As I was leaving yesterday I had made a test app for the ZeroMQ guys to check and then posted the following test results as I varied the value of ZMQ_RATE:

bps	ZMQ_RATE	Initial	Final
10 Mbps	10000	7 MB	18 MB
50 Mbps	50000	7 MB	73 MB
200 Mbps	200000	7 MB	280 MB

The data was pretty compelling. The effect ZMQ_RATE had on the memory footprint of the same data source was staggering. Thankfully, I put it all together in a nice email to the mailing list and I got a great hit from Martin S.:

Isn't it just the TX buffer? The size of PGM's TX buffer can be be computed as ZMQ_RATE * ZMQ_RECOVERY_IVL. The messages are held in memory even after they are sent to allow retransmission (repair) for the period of ZMQ_RECOVERY_IVL seconds.

So I added the following to the ZMQ transmitter's code:

  static int64_t     __rate = 50000;
  static int64_t     __recovery = 1;
  static int64_t     __loopback = 0;
 
  // we need to set this guy up properly
  top->socket->setsockopt(ZMQ_RATE, &__rate, sizeof(__rate));
  top->socket->setsockopt(ZMQ_RECOVERY_IVL, &__recovery, sizeof(__recovery));
  top->socket->setsockopt(ZMQ_MCAST_LOOP, &__loopback, sizeof(__loopback));

And then started running the tests again.

The results were amazing:

bps	ZMQ_RATE	Initial	Final
50 Mbps	50000	7 MB	11 MB
200 Mbps	200000	7 MB	32 MB

This was exactly what I was looking for! The ZMQ_RECOVER_IVL can't go below 1 sec, but for me even that's too much. If you're not here and ready to get ticks, then waiting a second is likely to be several hundred if not several thousand messages. It'd be fine with me to make it 0.5 sec - but Martin says that's the underlying resolution of OpenPGM.

Not bad. I'll take it. What a great morning!

[12/7] UPDATE: the option:

  static int64_t     __loopback = 0;
 
  top->socket->setsockopt(ZMQ_MCAST_LOOP, &__loopback, sizeof(__loopback));

is a massive red herring. It's not about the loopback interface, as my reliable multicast URLs are all targeted to specific NICs, it's more about being able to receive on the same box as the sender. I was trying to figure out why things "broke", and it's when I took this out that things worked again. Dangerously worded docs on this one... leave it out.

This entry was posted on Friday, December 3rd, 2010 at 11:10 am and is filed under Coding, Cube Life. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Tracking Down Nasty Memory Issue – Patience is a Virtue (cont.)

Pages

Archives

Categories