Tracking Down an Annoying Boost ASIO Problem

Boost C++ Libraries

In the midst of working on the changes necessitated by the Broker's re-write, I found myself in a very nasty little problem. I am trying to do things quickly, and my test cases are often far worse than any real-world use is going to be, but they have served me well, and they were pointing out a problem I was having this morning.

If I created new updater instances for each request, and deleted them after they were no longer needed, I ended up with a very fast create-use-delete lifecycle. This lead to a segmentation fault in boost's io_service - specifically, in it's run() method. The core dumps were of little to no help whatsoever, and I was left trying to diagnose the problem from my end.

If I didn't delete them right away, but threw them into a pool, and still created new ones, only clearing out the pool at the end of the application, then everything was fine. It seemed like it was just the short lifecycle connections that was the problem. Very nasty.

The seg faults weren't on anything related to boost asio, either. They were on the line right after the context switch after the closing of the socket connection. I spent hours debugging the code to find that guy.

I came to the conclusion that there was something in the io_service that wasn't getting a "chance" to handle the socket closing before I deleted it. So I changed my code ever so slightly. Originally I had:

  if (si->second.pool.size() >= MAX_UPDATERS_IN_POOL) {
    cLog.debug("[recycle] the pool is full - deleting the updater");
    delete anUpdater;
  }

to:

  if (si->second.pool.size() >= MAX_UPDATERS_IN_POOL) {
    cLog.debug("[recycle] the pool is full - deleting the updater");
    anUpdater->mChannelID.clear();
    anUpdater->disconnect();
    // we need to let the io_service have a go at it
    boost::system::error_code  err;
    mIOService.poll_one(err);
    // ...and now we can successfully delete the updater
    delete anUpdater;
  }

The difference was stunning. No more crashes, and the code was rock solid every time I ran it. Amazing. I'm going to have to remember this. It's like a little context switch for the io_service so it can detect the close of the socket before it's deleted.

Several things to finish on the re-write, but it's getting close now. Nice.