Hard to Find a Good Fine-Grained Timer on Linux
I'm looking at some logs of a client of my ticker plant. I'm trying to iron out the last few bugs in the system, and I'm coming across the following data that's making no sense whatsoever. Basically, every 10 sec, my client is printing stats on the messages it's received. Things like messages received per second (on average), messages conflated, the size of the conflation queue, the min, max, and average delay of the messages as timestamped when they arrive from the exchange to the time they are received by my client code.
What I'm seeing is something like this:
Time | Messages (/sec) |
Conf Msgs (/sec) |
Conf Queue (msgs) |
Max Delay (msec) |
13:30:47.01 | 4083.4 | 550.6 | 0 | 838.6 |
13:30:57.01 | 6561.7 | 547.6 | 24 | 13459.0 |
13:31:07.01 | 5170.2 | 797.5 | 0 | 410.7 |
My concern is about the max delay - the last column, and for the middle time slice. It's showing that on an empty conflation queue, this interval had a maximum delay from the source of 13 sec.! That's longer than the sampling interval. It's as if it was hiding up stream!
I initially suspected the time base I was using. Originally, I was using gettimeofday() as it's pretty universal, and calculating the microseconds since epoch with:
struct timeval tv; gettimeofday(&tv, NULL); return (tv.tv_sec * 1000000 + tv.tv_usec);
Then I read on stackoverflow that this isn't a very good timebase, and I switched to the supposedly more stable clock_gettime():
struct timespec tv; clock_gettime(CLOCK_REALTIME, &tv); return (tv.tv_sec * 1000000 + tv.tv_nsec/1000);
I'm not sure if this is going to be a lot better, as I think there's a problem still lurking in my code, but it's a start. Getting a good timer is important when you're timing things this close.