Very Non-obvious Memory Leak (cont.)

bug.gif

Today I spent the nearly the entire day trying to find the remaining memory leak(s) on my ticker plant. Again, only a few were being effected and again, I focused on the code they exclusively have. Once again, this was a complete waste of time as it wasn't in the exclusive code but a very odd little bug in the shared code.

When I'd struck out after several hours of testing, I decided to start at one and and sweep the code with a fine-tooth comb. Starting at the incoming UDP feed, I cut the rest of the app off and checked the memory usage. Stable. Good. Now let's add in the next step. Leak? Ha... let's fix him. Continue until we have the leak spotted.

The first problem may not have been a leak, but it was an unreliable memory usage pattern. The UDP datagrams came in the boost asio socket and I placed the datagram into a std::string with the time as microseconds since epoch into a simple stl::pair. This was then placed into a simple std::deque. Something like this:

  typedef std::pair<uint64_t, std::string> TaggedDatagram;
 
  CircularFIFO< TaggedDatagram, 100000 >   mStaging;

where the CircularFIFO is a single-producer, single-consumer, lockless, circular FIFO buffer for fast pushes and pops of the data coming off the wire.

The problem with this design is that we have to create the std::string every time anyway, and the storage of this structure is very unpredictable. What I decided to do was to switch from the stack to the heap and change the structure:

  typedef std::pair<uint64_t, std::string *> TaggedDatagram;
 
  CircularFIFO< TaggedDatagram, 100000 >   mStaging;

Now it's basically two 64-bit ints and the heap will reclaim the memory as needed. This was a nice addition, but it wasn't the final problem.

It was at the end of the day, but thank goodness that I found it. It turns out that the ZeroMQ send() method was the culprit. Normally, the ZeroMQ has been very nice for me. Why these messages caused problems, I have no idea, but it's the one method and nothing else.

I know they are working on a new version (2.1) with the latest OpenPGM included, and that will be nice to see. Tomorrow morning when they are all online, I'll ask what the story is on the release of 2.1. Until then, I'll deal with the smaller, but still annoy leaks.

Whew!

[11/24] UPDATE: I talked to the ZeroMQ guys this morning and they say the release of 2.1 is scheduled for this week. Nice. I'll get it early next week and try it out.