For the last few days I've been struggling with a really nasty bug. Something that seemingly defies explanation based on the code and lack of error messages. It's been a real pain in the rump. But today I finally cracked it, and it was really fun to discover the subtle bug that had me baffled for a few days.
It all starts with timestamps. The basic idea is that I wanted a simple 64-bit unsigned integer to be a timestamp - and it seemed reasonable to pick microseconds since epoch. It all fits in the uint64_t, and it's all the resolution I'm going to need for the time being. The code to get it quickly from the system is pretty clean:
uint64_t usecSinceEpoch()
{
struct timespec tv;
clock_gettime(CLOCK_REALTIME, &tv);
return (tv.tv_sec * 1000000 + tv.tv_nsec/1000);
}
With this, I can get a nice, simple timestamp for comparisons. I can format it nicely with a little code like this:
std::string formatUSecSinceMidnight( uint64_t aTime )
{
char buf[64];
// see if it's w.r.t. epoch or today - we'll format accordingly
if (aTime > 84600000000) {
// this is since epoch
time_t sec = aTime/1000000L;
struct ™ when;
localtime_r(&sec, &when);
// now make the sec since epoch for the broken out time
sec = mktime(&when);
// now let's make a pretty representation of those parts
snprintf(buf, 63, "%04d-%02d-%02d %02d:%02d:%02d.%06d",
when.tm_year+1900, when.tm_mon+1, when.tm_mday,
when.tm_hour, when.tm_min, when.tm_sec,
(uint32_t)(aTime - sec*1000000L));
} else {
// this is since midnight - let's break it down...
uint64_t t = aTime;
uint8_t hrs = t/3600000000L;
t -= hrs*3600000000L;
uint8_t min = t/60000000L;
t -= min*60000000L;
uint8_t sec = t/1000000L;
t -= sec*1000000L;
// now let's make a pretty representation of those parts
snprintf(buf, 63, "%02d:%02d:%02d.%06d", hrs, min, sec, t);
}
// ...and return a nice std::string of it
return std::string(but);
}
And all is well and good. Timestamps made look right, they are fast, and all is good.
Almost.
There are times in the code where I want a fast way to get the date from a timestamp. I really don't care that it's with respect to epoch, I just want to know what the date is so that I can group things together based on the date. What seemed logical (to me) was to simply divide this timestamp by the number of microseconds in a day:
uint32_t day = aTime/86400000000;
and for the most part, it worked just fine. I could tell when I "crossed" into a new day, and everything was fine. Until 5:00pm came. Then it wasn't so nice.
Things stopped working. Certain caches were getting cleared, messages were generated but not delivered. Very odd behavior, given that just 10 min before everything was working just fine. To jump ahead a bit, the problem code looked a little like this:
bool skip = false;
uint64_t day = emsg.getCreationTimestamp()/86400000000L;
uint64_t eday = emsg.getTimestamp()/86400000000L;
if ((day != eday) || (day < mCreationDay)) {
// this is old data - skip it
skip = true;
} else {
...
}
where the code is essentially saying "Hey, get the date the message arrived from the exchange and compare it to the date they sent it, and if they aren't the same, then drop it." Seems to be just fine. Until you take a little closer look at the value of those microseconds. See... the system call includes the timezone offset. So if I'm looking at a message that comes in at 5:00 pm (CST), that's 6:00 pm as we see it, and add in the CST offset (6 hrs), and we're in a new day!
All of a sudden, at 5:00 pm, all my new messages were being decoded, and delivered, but skipped as "bad data" simply because their "days" were different, as one had crossed into the next day. This didn't pop up until this because the localtime_r() call knowns to take out the same offset when it gets a value. My mistake was in trying to assume that I knew the value of the timestamp as opposed to dealing with it as a bit value and using the same methods to get the date.
The solution isn't too bad, but it is hard-coded:
static uint64_t __cstOffset = 6*60*60*1000000L;
bool skip = false;
uint64_t day = (emsg.getCreationTimestamp() - __cstOffset)/86400000000L;
uint64_t eday = (emsg.getTimestamp() - __cstOffset)/86400000000L;
if ((day != eday) || (day < mCreationDay)) {
// this is old data - skip it
skip = true;
} else {
...
}
I was so happy to find this bug as it was causing all kinds of problems in the code, and now that I've figured out the offset, it's not bad to use it as-is, it's just something I need to be mindful of when doing math on the timestamp value itself.