Odd Timeout Bug with Boost Async TImeout

Boost C++ Libraries

One of the things I've been noticing is that my shotgun approaches to the requests to the Broker are all timing out, and then successfully retrying. Very odd. Why fail, and then immediately succeed? Now that I have my ticker plants able to keep up with the feed, I wanted to take a little time and figure this out.

I started logging all the bytes on my test code - which I copied from the working ticker plant, and then saw the problem - every one of the requests/connections was being torn down before the data could be processed. All the data was there - it just wasn't being processed. Very odd. So I added logging in for the different places where the connection could be invalidated, and ran it again.

A timeout!? Really? These requests are taking less than a second, and the timeout is set for 25 sec. What's going on here. And then I started to really look at the code I had:

  void MMDClientUpdater::asyncReadTimeout( const boost::system::error_code & anError )
  {
    // we need to alert the user that the timeout occurred
    if (mTarget != NULL) {
      mTarget->fireUpdateFailed("asynchronous read timeout occurred");
    }
 
    // now remove this updater from the pool
    mBoss->removeFromUpdaters(mChannelID);
  }

Unless you've done a lot with boost asio and timeouts, this looks fine. The code is called when the timer fires and I'm able to respond to it. But that's not quite the whole story. It turns out that on a timer cancel, we get an error code. We really need to have:

  void MMDClientUpdater::asyncReadTimeout( const boost::system::error_code & anError )
  {
    if (anError != boost::asio::error::operation_aborted) {
      // we need to alert the user that the timeout occurred
      if (mTarget != NULL) {
        mTarget->fireUpdateFailed("asynchronous read timeout occurred");
      }
 
      // now remove this updater from the pool
      mBoss->removeFromUpdaters(mChannelID);
    }
  }

Now we have something that works. I was simply missing that a cancel was going to be "seen" as an error. I updated the code and everything started working without the failures. Why was the retry working? It was a synchronous call because it was a retry. Funny.

Glad to get that solved.