Tricky Timing Bug with Inherited Threads

bug.gif

This afternoon a co-worker pointed out a problem with a threaded app he was testing. It was using a data feed component I'd written, and it was causing seg faults when it was shutting down. He was having a hard time figuring out why it was crashing, and I was having a hard time figuring out what was causing the state to be reset.

The thread model I was using is a simple class that runs a process() method over and over catching exceptions, etc. until the process() method returns a "stop" flag. Pretty simple stuff. There's the ability for users of the thread object to tell it to stop:

  Thread::setTimeToDie(true);

and the next time it's ready to call process() it bails out and stops. Pretty simple. But it wasn't acting that way in the tests.

I subclassed this Thread for my data feed class, and when I detected that the parent thread was to stop, I stopped some processing sub-threads. The structure is pretty simple - the main thread was handling supervision, a boost ASIO thread was handling the incoming data, and the processing sub-threads were taking that raw data and converting it (two steps) to be used downstream. What was supposed to happen was that the sub-threads were to detect when the parent Thread was to die, and they themselves, would then die.

What appeared to be happening was that the sub-threads weren't getting the message. Or at least not getting it in time. Very odd. If I told the Thread to stop, the sub-threads didn't stop. If I told the Thread to stop, and then did a little shut-down processing, and finally told the Thread to stop again, then things shut down.

It appeared that there's a nasty timing problem here, and I didn't want to leave it at this, but there's no more time today. For now, it's working, but very oddly.

[3/31] UPDATE: Interesting point... this morning, on my walk to the train, I saw it. When the Thread stops, it resets the "time to die" flag to false. The sub-threads weren't seeing it because they weren't checking fast enough. The main thread died, reset it's "die" flag, and the sub-threads just didn't think there was any reason to stop. The fix was easy - don't reset the flag until the start of the next thread. Easy fix.