Archive for the ‘Coding’ Category

Still Trying to Get More Speed in the Ticker Plants

Tuesday, January 18th, 2011

bug.gif

This morning's tests didn't go any better... once again, I was overrun like the guys at the Alamo. Pretty depressing. I've been on this for quite a while now, and it seems that I'm making no real progress.

Once again, today was spent making a ton of little changes to the code in hopes of finding the killer bottleneck. One of the biggies was making versions of the message processing methods that took container references as opposed to passing the new containers back to the caller. If I'm making a lot of messages, and constantly passing them back on the stack I've got to be taking more time than passing in the reference to the container and just adding the messages to that container reference.

I left the old "pass back" methods in the code, where I just create a container, pass it to the new method, and then return it to the caller. It's a clean and easy way to have one version of the method while supporting two different calling schemes.

There were a lot of other little things - speeding up the creation of the security ID by not packing the strike mantissa using Google's VarInt encoding. It's not slow, per se, it's just not as fast as blasting down 4 bytes using memcpy(). And when you're going this 50,000 times a second, every little bit helps.

We'll have to see tomorrow what the effects are. I'm feeling good about my chances, but I"ve been here before, and felt good before. I just have to keep at it. Never give up.

Hard Struggle Integrating Trie into QuickCache

Friday, January 14th, 2011

bug.gif

In my continuing quest for more speed in the tick processing of my Ticker Plants, I decided today to integrate the 128-bit Trie I'd created for the security ID into the QuickCache - that container for the messages that each Ticker Plant has to enable clients to obtain the "last known" messages. The reason for this is that the existing QuickCache was not respecting the different exchanges on the Quote message, and it needed to. Retrofitting that behavior into the QuickCache turned out to be a major engineering feat.

The differences are really based on the different storage organizations. In the old quasi-trie, I had the messages first organized by their type (Quote, Trade, etc.) and then by their underlying name, finally by their type - stock, option, etc. It's in the 'name' part of the storage where I'd used the trie - one level for each of the first four characters in the underlying's name. This had a lot of really nice benefits - the messages were stored by family so I could respond easily to requests of the same nature, which come up quite often, really.

When I switched to the 16-byte (128-bit) trie where it was a complete lockless trie for the entire security ID, I lost the ability to easily find messages by their type, instrument type, etc. because it was all encoded into the security ID, and while it's possible to try to create a "mask" of values to "scan" in the trie, but that makes for some horrifically bad code, and in the end, it was easier to take the hit on the performance and be more conventional in the scanning.

But what a hit. I have to scan all values in the trie to make sure I find all the values of a family... or the first alphabetical underlying... or last. It's very straightforward, but not really efficient. Using the name-based trie made many of these much simpler. But the limitations of the name-based trie were too much to overcome, and it still had locks on the option maps, and in the end, it just wasn't going to be fast enough for the data stream at peak loads.

Still... in order to get the new 16-byte trie into the QuickCache it took a lot of work. But in the end, all the tests passed with flying colors, and I had a fix to a problem that was giving me grief. Glad that's done.

Getting Into the Details for Efficiency Sake

Thursday, January 13th, 2011

bug.gif

Today I had an other unsuccessful day testing my ticker plants against the open and the OPRA feeds. They were pumping out far more data than my code could process. It was working, but with all the changes required for the necessary functionality, I was falling behind, and it wasn't a great feeling.

I spent the day looking at things like making ivars protected and using them directly as opposed to passing values to protected setter methods. A stretch, yeah, but that's what today has been about - trying to find out what's up with the speed and why it was working fine before, and now it's really pretty bad.

I'll have to try these on Monday and see if they make any difference.

Finding More Speed in the Unlikeliest of Places

Wednesday, January 12th, 2011

bug.gif

Today I've spent a ton of time with the uint128_t trie to get it as fast as possible. My goal was to beat the old system, and eventually, I think I have. But getting there was a painful process that had a lot of battles along the way.

When I started testing the uint128_t trie, I noticed that the access was significantly slower than the old character-based trie I had been using. Sure, the old one wasn't going to really work for what I needed, but still... to have a 10x speed hit is not what I was looking to do. So I dug into it.

Turns out the new trie is amazingly fast. I mean just blinding. Good for me. The problem was in generating the conflation key that I use in the trie to store the messages. Really... it was around 7 ms to generate the key and then only 0.01 ms to actually put it in the trie. Yikes!

So I had to keep digging. Turns out I was doing a lot more copying than I had to do, and by 'had to do' I mean 'designed into the API'. I was using STL containers for things when it was possible to make the methods really work on (char *) arrays and then wrap those into the ones that used the STL containers for backward compatibility. This netted me a considerable speed improvement, but I'm still not happy.

I also dug into the payload pools and found that I could do a little better job there as well. Again, no major change in the code, but every little bit is going to help in this guy. I've got more feeds than I have CPUs, and it's getting to be a little nasty to worry about the context switching.

In the end, I changed a lot of little things, but got the speed close to what it was. I'll have to wait and see what happens on the open tomorrow morning to be sure. I'm concerned, but cautiously optimistic.

Google Chrome dev 10.0.634.0 is Out

Wednesday, January 12th, 2011

V8 Javascript Engine

Today I noticed that Google Chrome dev 10.0.634.0 was out and so I upgraded. The big thing that seems to be in this release is the updating of the V8 engine to 3.0.6.1. I'm still a little sore about the plans Google has stated about removing H.264 from Chrome, but that just makes it easier to realize that Google really is going through a really bad spell, and maybe in a year or ten, they'll pull out of it. But I don't know... it took IBM more than a decade, and Texas Instruments never really recovered.

I sure hope the bad folks making bad decisions in Google get moved out. It's sad to see a great bunch of engineers used to do very bad things. Sad.

Everyone Back in the Pool!

Tuesday, January 11th, 2011

bug.gif

Today was spent working a lot on the efficiency of the ticker plants - most notably on removing the excessive use of malloc() and free() or new and delete and replacing it with some pooled containers for the processing of the UDP datagrams and the payloads of the ZeroMQ messages. I'm not really sure how inefficient linux is for dealing with small allocations, say less than 1kB, but it can't be good to have 20,000/sec flipping through the system.

It's really pretty simple - given the fact that I'm dealing with reasonably small container sizes. I just needed to have an STL list of some kind - a simple spinlock to protect it, and then a simple alloc() method to get one from the pool, or create one if the pool is empty, and recycle() to put it back in the pool, if there's room, or delete it if not. Not rocket science.

In fact, I made a special StringPool just for the ZeroMQ messages as they are being held in simple STL std::string objects. The bonus of all this is that I don't ahve to have a "receive buffer" and then copy the data from the receive buffer into the container for pushing onto the stack. I can allocate one from the pool, have the boost ASIO put the data in the container, and then simply transfer the pointer into the stack.

Far simple. Very sweet, in fact.

Once I had it done, it has to work better even if it doesn't spec out to be any faster - the switches to the system for malloc() and free() have to be making a positive difference. Good enough.

Working on Ticker Plant Efficiency

Monday, January 10th, 2011

Today I noticed that I had a serious problem with the CPU usage of my ticker plants. Specifically, the OPRA channels were 3x or 4x larger than they had been before I had made the last round of changes. Not good. Many of the ticker plants were above 100% CPU utilization, and when you have that many plants on an 8 core box, it's not good to have four or more of them taking more than a complete CPU. Yikes.

So today was spent trying to find where they problems were. I guess I knew where they were - in the code that changed, but the problem is, that code was for the conflation queue, and that needed to be changed, and the resulting code is just awfully slow.

I have a feeling it's in the boost unordered_map, but I'm going to try and make it work, if possible, for the alternative is a nasty bit of code - another trie and this time very large. I don't want to have to do that.

Nice Day of Cleaning Up Outstanding Issues

Friday, January 7th, 2011

Today was a day of cleaning up a lot of little nagging issues on the ticker plants. I spent time fiddling with the timing on lockless queue polling, IRC responding, exchange mappings... all these things that I needed to get to - eventually, but today was the day to "clean the decks". Not bad.

Nice day, too. It's great to get that sense of completion.

Finally Getting New Broker Changes into Codebase

Thursday, January 6th, 2011

Ringmaster

Today was a long one... very long. But it was for a good cause. Today I finally got the bulk of the direct dial Broker code working in my apps. It wasn't easy, and probably most of that was my fault. I had looked at the code for my TCP clients and proxies and copied that as a starting point. What I failed to realize was that the Broker code needed to have entirely different handshaking on the connections, and while it would appear to work, it was really a hopeless mess.

No more. It's looking pretty good and I'm hoping to get the final touches on it in the morning.

Un-Ignoring Files and Directories in Git

Thursday, January 6th, 2011

gitLogo_vert.gif

I had a nasty little problem the other day, and I couldn't figure it out until I really started googling the problem and saw that the solution was (of all places) in the man pages. Here's the set-up for the problem.

I have a git repo and at the top level it looks like this:

  Makefile
  README
  bin/
  doc/
  java/
  lib/
  logs/
  src/
  tests/

and because the lib directory contains nothing by generated files (the so libraries), it makes sense to have a top-level .gitignore file looking like this:

  *.swp
  *.swo
  *.tgz
  lib
  logs
  __dist
  core.*

But then when we look at the next level, we see that things get complicated:

  Makefile
  README
  bin/
  doc/
  java/
    Makefile
    build.xml
    classes/
    dist/
    lib/
    peak6/
    tests/
  lib/
  logs/
  src/
  tests/

And in the java/lib/ directory we have third-party jars that we want included in the repo. The problem is, git looks at the top-level .gitignore and sees the line with lib in it, and therefore, the lib in the java directory is also ignored.

But there's a fix. Un-ignore the java/lib. How to do that?

Make the .gitignore fine in the java directory look like this:

  classes
  dist
  !lib
  *.classes
  *.swp
  *.swo

and it's the inclusion of the !lib line that's telling git "Hey, don't ignore this guy".

The more I use git the more impressed I am with it. Subversion made me copy the files to ignore to each directory, git is nicer with the directory structure providing the inheritance, and with this, I can now be as selective as I need. Sweet.