Added Auto-Flipping to Ticker Plants
Recently we had a serious problem with one of our exchange feeds. Basically, at 2:00 pm, we just stopped getting the 'A' side of several feeds. Because of the way I'd changed the Ticker Plants, this amounted to a complete halt on ticks. Very bad. But not a bug.
Interestingly enough, the system we have in the Ticker Plants is a consequence of trying to be as accurate as possible. If we have two sides of the same feed - A and B, and they are both supposed to be coming in at about the same time and rate, how do you arbitrate between the two to get the most complete feed and not send down any duplicates? Well… the first idea I had was to look at each side and take the most up-to-date message. Well… that works OK, until one of the feeds gets ahead of the other, or has a skip, and then one feed is showing message 100, and the other is showing 200.
I'd take the 200, and then look for 201 - totally skipping the fact that 100 through 200 is on the other side, just waiting to be used.
The solution was then to look at one side as the "preferred side" and use the other only to "fill in the gaps". This is great as it doesn't skip over message blocks, but the problem is that if your preferred side goes down, you're dead. Even if you have the message stream on the other side, it's going to be ignored as it's only there for filling in the gaps.
As an aside, this brought to a head something that has been a great sore spot for me in the recent reforges at The Shop. That is the need to "check with the manager" to fix something. I estimated that this "auto-flippimg" would take a few days. Not bad, and then it's smart enough to always pick the right side. A far superior product. But it wasn't seen that way. It was seen as something that needed to be scheduled, and planned, and blah, blah, blah. I wanted to scream!
This was my project, and I wanted to take a few days. That should be the end of it. Period. End of story. If I'm too late with too many things, then fire me. But after 35 yrs of doing this, I'm experienced enough to know what's really important, and what is something that's optional and can be scheduled.
Exceptionally frustrating.
So I ended up getting the "OK" and today was the coding.
The solution I came up with was to look at the number of queued messages on the non-preferred side when the preferred-side was empty. If that number was large, say 100,000, then it's pretty clear that the preferred side is in trouble, so let's switch the sides. I needed to do a little more than this to make it clean and as atomic as possible, given that multiple threads are involved in this code, but it wasn't too hard.
In the end, it was a nice change, and I really liked knowing it was going to protect itself from outages in the future.
Now if fixing the management issues were only this easy...