Today I've once again been shown the critical importance of good comments. To set up the problem and it's resolution let's look at what was before the problem, and see how a good set of comments (by me) could have avoided a complete day of frustration.
We start with the statement of the problem:
Each exchange feed comes on both A and B lines where the same data is on both lines, but they arrive at the datacenter through different paths so that should one be lost, the other has the same data to carry on.
The problem comes in when you try to arbitrate these two lines in your code. You need to look at key sequence numbers and see which one has arrived first for the next number in the sequence, and then there are always the special cases where the exchange sends a reset sequence number message and you have to deal with that. So we need to carefully arbitrate between these two streams of messages to make sure we get the first copy of each message as soon as it's available.
The problem is that I want all this to be multi-threaded and lockless. It's that last part, combined with the fact that I don't want to waste threads, that brings us to the solution I arrived at. Basically, I had the 'primary' channel (A channel) processing thread doing all the arbitration, and the 'secondary' channel (B channel) processing thread sending it's results to a queue that would be read by the 'primary' thread. This merging of the two data streams was a pretty nice idea - I just needed to work out the processing of the 'non-primary' feed's queue with the primary processing thread.
What I had done was to make a method to process all the pending messages:
bool UDPExchangeFeed::processPendingMessages()
{
bool error = false;
Message *msg = NULL;
while (mQueue.pop(msg)) {
if (msg != NULL) {
if (!processMessage(msg)) {
error = true;
}
}
}
return !error;
}
This seemed fine and worked great for quite a while. Then another developer looked at the code and said "Hey, this is going to starve the primary channel!", and because I didn't have a really good comment as to why I chose to do it this way, I said "Wow... yeah, I can see that." and so we changed it to look like this:
bool UDPExchangeFeed::processPendingMessages()
{
bool error = false;
Message *msg = NULL;
if (mQueue.pop(msg)) {
if (msg != NULL) {
if (!processMessage(msg)) {
error = true;
}
}
}
return !error;
}
so that we're doing a 1:1 of the primary and the secondary. Which sounds fair, and a better solution, but the problem is that it really isn't, and it takes a little real-world thinking to get there.
Two feeds decode messages in the same time - one then transmits one, and the other just pushes the message on a queue. Clearly, the second one is going to process messages a little bit faster than the first. This means that it may take 10 or 20 messages, but sooner or later, there will be two message in the queue to process, and only one is going to be processed. Repeat.
Pretty soon, the queue overflows. After all, we're talking upwards of 50,000 msgs/sec. It doesn't take too many to get out of hand. So how to fix it?
Always empty the second queue. It's not unfair, it's the equalizer. Most times it's only going to have one message in it in the first place. But when it has two (or three), it's best to clear them all out right then, rather than to let them sit there and build up. So my initial implementation was really the better one - but I hadn't put this level of documentation with it so as not to be seen as wrong.
Now there's a four paragraph comment on that little method just so we're clear about why it's doing what it's doing, and it's not a bug.