Finding More Speed in the Unlikeliest of Places

bug.gif

Today I've spent a ton of time with the uint128_t trie to get it as fast as possible. My goal was to beat the old system, and eventually, I think I have. But getting there was a painful process that had a lot of battles along the way.

When I started testing the uint128_t trie, I noticed that the access was significantly slower than the old character-based trie I had been using. Sure, the old one wasn't going to really work for what I needed, but still... to have a 10x speed hit is not what I was looking to do. So I dug into it.

Turns out the new trie is amazingly fast. I mean just blinding. Good for me. The problem was in generating the conflation key that I use in the trie to store the messages. Really... it was around 7 ms to generate the key and then only 0.01 ms to actually put it in the trie. Yikes!

So I had to keep digging. Turns out I was doing a lot more copying than I had to do, and by 'had to do' I mean 'designed into the API'. I was using STL containers for things when it was possible to make the methods really work on (char *) arrays and then wrap those into the ones that used the STL containers for backward compatibility. This netted me a considerable speed improvement, but I'm still not happy.

I also dug into the payload pools and found that I could do a little better job there as well. Again, no major change in the code, but every little bit is going to help in this guy. I've got more feeds than I have CPUs, and it's getting to be a little nasty to worry about the context switching.

In the end, I changed a lot of little things, but got the speed close to what it was. I'll have to wait and see what happens on the open tomorrow morning to be sure. I'm concerned, but cautiously optimistic.