Fixed Auto-Flipping on Exchange Feeds

bug.gif

This morning we had an unusual situation with the exchange feeds due to some downed lines from one of our providers. Let's forget for a minute that I used to do this job with the telcos, and I know exactly how they respond, and to think that this is down for a few hours - let alone a day is something that I almost laugh at. OK… I really laughed at this.

But like I said, let's forget about these facts…

No, today was a unique opportunity for me to test my auto-flipping logic on the exchange feeds because for some feeds we lost the A side, and others we lost the B side. So I should expect to see groups of feeds on A and others on B. What I saw was that nothing flipped, and so I dug into why.

Well… it turns out there were a few mistakes on my part. I had originally been using:

  bool UDPExchangeFeed::drainAllPendingMessages()
  {
    bool                 error = false;
    msg::DecodedMessage  *pkg = NULL;
    while (mPackages.peek(pkg)) {
      if (pkg == NULL) {
        mPackages.pop(pkg);
        continue;
      }
      deliverMessages(*pkg);
    }
    return !error;
  }

the idea being that if I ran into a NULL in the queue, I'd skip it. Otherwise, I'd deliver the messages in the package and continue. Hold on a sec… there's my first mistake. I'm never popping off the messages!

Yes, friends, I had an infinite loop, and that was what was stopping my flipping from happening. I needed to have something like this:

  bool UDPExchangeFeed::drainAllPendingMessages()
  {
    bool                 error = false;
    msg::DecodedMessage  *pkg = NULL;
    while (mPackages.peek(pkg)) {
      if (pkg != NULL) {
        deliverMessages(*pkg);
      }
      mPackages.pop(pkg);
    }
    return !error;
  }

where it's clear that only in the case of non-NULL peek, did I do something, but I always popped off that top element to continue.

The next problem I found wasn't so much a logic issue as a use-case issue. The trigger that I was using for knowing when to flip sides was the size of the incoming datagram queue. The problem with this is that if the decoders are working, that queue is almost always going to be very small. It's the decoded packages queue that was also in play. So let's add them and use that as the trigger. Looking much better now.

The final issue was really one of size. What happens when I have a trip level of 50,000 messages, and I have a feed that doesn't produce that in 5 mins? I get stale data. That's no good. What I need to do is to detect when there's a long period of inactivity in the preferred side, and there's something on the other side to use. In order to figure this out, I put a little counter on the loop to count up not many "preferred side is empty - wait", passes I'd had. If it was enough, say 75, then if there's something on the other side - even if it's not 50,000 messages, flip over because this side isn't producing anything now.

With this, I get the behavior I was originally looking for. We flip when we have data and it doesn't take a long time to do it. I don't miss a lot, and we have a nicely self-adjusting system. Good news that this came up today.