When Two Wrongs Make a Right – Finding a Nasty Bug
Today I've spent all day tracking down the most devilish bug in my code - yup, right there in my code. And the reason I didn't see it right away is that this code hasn't changed in several days, and it's been working perfectly for quite a while. But the trick was that it unknowingly depended on another bug, that was fixed yesterday evening, and because it was fixed, my bug became a real bug. But figuring this out was a painful, and laborious, task.
The set-up for the way it used to 'work' was that I had two services, on two boxes, and each service was hosted by a Broker:
The client would randomly contact one of the locator services - most likely, it would be the one on the same box as he was running - but there was no guarantee to this. But for the sake of example, let's say it hits the red box. The locator service is asked "can you handle this symbol?" and if it can, it responds to the client immediately. If it can't, then it asks the Broker to list all services that start with the same name, and then proceeds to ask each if they can handle the symbol.
The red locator hits the blue locator, and since it's got to be one or the other, it answers pretty quickly. So where's the bug? Well... the first one is that if there are two services with the same name, we should 'prefer' the service on the same host as the client. This minimizes the network traffic and keeps things "local" as much as possible. You can see it coming, can't you?
With the preference set to local services, the red locator will ask the Broker for all similarly named services and get - you guessed it: itself! This places it in an infinite loop - but with boost asio, there's only one thread to process things, and that one thread can't receive and send at the same time, so we lock up.
So the fix was simple - don't ask for all similarly named services - make sure you exclude yourself! With this simple one-line fix to the code, everything worked again. It was just a complete day trying to figure out where the problem existed in order to figure out that one line that would do the trick. Ick.