Archive for January, 2008

More Evils of Optimization and Performance Tuning

Tuesday, January 8th, 2008

servers.jpg

This morning I came in to find that the CPU usage on the server's kernel is at an all-time high! OK, thankfully, it's still running, but it's not running the way I had expected it to be running this morning. I had done too much optimization and that lead to backups in the production system. Unfortunately, there's no way to test the server like it's used in production other than to put it into production. There are clients all over the globe, and it's just not possible to load the dev server to the same level. So what was looking like it was working very well in development, wasn't working so well in production.

Not a big surprise, but it's something I need to go back and mess with. Basically, the system was balanced - no one component that much faster than the others. When I made the pub/sub system faster, that allowed the price ticks to move faster, and the clients had to keep up. Well... what if they're on the other side of the globe and the WAN line just isn't fast enough to get them the bandwidth to the server that they need to keep up? Yup, backups.

It's not horrible, as the server becomes a self-regulating system, but it's annoying that there's so little I can do to check these kinds of interactions.

So today I'm backing out a few of the changes to the pub/sub system and putting in a few more subtle changes to the pub/sub system. We'll put in an index to the primary feeder queue so it's faster to look up duplicates (if it's configured that way)... we'll also have a little better look at the outgoing client queues now that they hold the instruments again (back to a few large queues)... all these will make the code more understandable and not quite as high performance, but in the end, it'll balance the system better.

I'm still going to try to get the tick rate up, but I may be able to do that with these changes installed. We'll have to see how the tests go.

UPDATE: OK... I'm getting the code back in balance. There are a lot of things I can do to make it work less and that's what I'm focusing on at this time. It's looking much better in development, but the acid test is, as always, production tomorrow.

Here’s Why I Don’t Like Performance Tuning

Monday, January 7th, 2008

cplusplus.jpg

I've made some pretty dramatic improvements in the server's price tick processing in the last few days. Problem is, it leaves me feeling... well... unsatisfied. The problem is that there's no end to it. Maybe I made it faster, but there's still a bottleneck, and that would be nice to remove, if possible. It's a never-ending stream of minute fixes that makes it a little faster, but it's not really faster enough to justify the time spent on it.

So I have to stop when it's really not done - it's just done enough. That's what leaves me feeling very unsettled. I'm a very completion-orientated person. I like to create and fix things, but performance tuning is just getting a little bit more out of something that's already working. Oh, it's nice to see a factor of four or more, but after those first few big gains, the next gains are almost always much smaller, and continuing to be smaller as you continue to work on the code.

I've just got to come up with a self-imposed criteria for stopping this work. Then, when I reach it, I can emotionally let go of this phase of the project and move on to the next. Good plan.

Getting the Most out of the Server Price Flow

Monday, January 7th, 2008

servers.jpg

Today I noticed that my previous efforts to speed up the tick flow through the server helped, but there were still times when the CPU usage climbed to those levels that looked like there was a large look-up being done. I checked the logs and about the same time those were happening, I was getting large ClientProxy queue sizes. This makes sense - if the outgoing queues to the clients start to grow, the search time for a duplicate (they must be unique) would go up, and that would slow down the injection rate, and in turn make the queue grow even more.

So... I needed to make sure I was using the most efficient storage as possible. I did a little reading and found that std::set is an RB tree, and the find() uses this to get to the elements as quickly as possible. No room for improvement there, unfortunately. But I could not give up, there had to be a solution.

And while it wasn't earth-shaking, the solution was fun. Rather than try to get more efficient data storage, realize that maybe I was asking it to do the wrong thing. All I needed was to make sure that each Instrument appears only once on each ClientProxy queue. Why not leave it up to the Instrument, then? Rather than have a few large lists in the ClientProxys, have a lot of small lists in the Instruments.

Elegant. Simple. Re-phrase the question.

So what I did was to put a std::set on each Instrument with a protecting mutex on it and then make the ClientProxy tell each instrument that it's going on the queue, and coming off the queue. That way, we can be sure that the same effect occurs, but that we can have a simple queue in the ClientProxy and it doesn't have to look for duplicates.

This change allowed the server to not get into those periods of large queue size. This is good in that the entire system will respond better to the prices. Also, it means that we don't have those conditions where the CPU usage rises and slows down the tick processing. It's a great solution all the way around.

MarsEdit v2.0.5 Update

Monday, January 7th, 2008

MarsEditIcon128.jpg

Looks like there's been an update on MarsEdit for a security problem. At about the same time I noticed that WordPress is at 2.3.2, but HostMonster doesn't have it available in it's application support tool. No matter, I don't use the 'drafts' section of WordPress at HostMonster - I use MarsEdit.

Lots of things seem to be updating today... Cool.

Adium is 1.2

Monday, January 7th, 2008

Adium.jpg

I can't think of a better poster child for great development on the Mac than Adium. I've been using it ever since I got an Intel Mac as my old multi-protocol chat client, Fire, didn't work properly on the Intel platform. As I was looking for a solution, I noticed that Adium was looking to be the preferred solution for most Mac folks.

I downloaded it, got all my accounts into it, fiddled with the preferences to get it looking and working like I wanted, and have been happily using it ever since.

Today they released 1.2, and the release notes indicate a ton of bug fixes and localization changes. Not that I had that many problems with it before, but it's nice that they are continuing to work on it. If you need something like this, get it - you won't be sorry.

Sometimes Cacheing Isn’t the Best Strategy

Friday, January 4th, 2008

servers.jpg

Today I was blown away by a performance boost I got from not cacheing resultant data. When you look at it, it's very logical why it's faster, but it took me an hour or so to get to the point of even questioning the cache as a bottleneck in the processing.

For the past day or so, I've been looking at the speed my server is processing incoming price events (ticks). It's a lot better than it used to be, but there were times when it took minutes to get a price through the server. It didn't seem to be a locking problem. It was just that some ticks took much longer than others to process. So I started digging.

The first place to look was in the processing of the incoming prices. The prices come from the feeder apps, go onto a compression queue (updates new data, but retains queue order) and then are pulled off by a thread and sent through the system. My first thoughts were that the queue was holding onto the prices longer than needed - primarily because the processing thread wasn't getting back around to the queue quickly enough. I thought it might be in the thread synchronization of the pushing and popping threads, but after a little bit of experimentation, I realized that this wasn't the case. The problem was elsewhere.

Next, I turned my attention to the updating of the prices. There's nothing in the code that would block the updates so drastically. I looked and looked further downstream, but in the end, this was just not where the problem was.

Finally, I started looking at the general CPU usage of the kernel process - the kernel of the server - not the OS. This lead me back to something I've know about for months, and just haven't gotten around to fixing: when there are clients attached to the server, the CPU usage goes way up - primarily due to the messaging to those clients of what's changed in the server. Unfortunately, I couldn't get good handle on why this was happening - until this morning.

There are two parts of the client communication with the server: the outgoing notification that a particular instrument has changed, and the request/response with the client for the complete detailed information about a particular instrument. I 'short circuited' each of these in turn, and found that the CPU usage was really attributed to the notifications and not the queries. I was a little surprised. Then I started thinking about the details of the notifications.

Originally, when the server would update an instrument's data, it would ask each connected client if they were interested in hearing about this guy's update. This then migrated to having a client proxy on the server side asked the question, which was an important step, but it was still a little pokey. The next step was to have the client proxy remember the answers for each instrument, and going on the idea that once an instrument is 'interesting', it's very unlikely that it will stop being 'interesting', and so we cached the 'interesting instruments' and first checked if the updated instrument was in the list of 'interesting instruments' before doing the more thorough check.

The list was a simple STL std::set of strings, but when the client had been connected for a while, it was possible that there would be literally thousands of 'interesting instruments' in the list. Scanning this list each time seemed to be the problem, so that's where I went to work.

The goal was to make checking the individual instrument so fast and easy that there would be no way a cache was going to be faster than simply checking this guy each time. The first step was to realize the fact that all the clients that connect to the server really are looking for all positions, so I made a few methods on the instrument class that would simply and very quickly tell me if there were any positions on that guy, and not bother with the more complete checking that was done when we wanted only a subset of positions from the server.

The second modification was in putting an STL std::set where I had a simple list - which had a linear sort, for those instruments we wanted to be notified of - even with no positions on them. When I put in these changes to the system and restarted I was amazed to see that fully 66% of the CPU usage of the 'connected' server was gone. Amazing.

The CPU usage of the kernel now very clearly tracks the incoming price events. It's at least 2 to 3 times faster for prices through the server, and it's simpler, smaller (no cache of instruments), and can now move more prices than ever before. Really very cool.

Cube Life Boundaries

Friday, January 4th, 2008

cubeLifeView.gif

I know I work in a little cardboard and steel box. It's a cube. Same as a million other cubes in the business world, but this one is mine. I've recently realized that just because it looks like mine, it really isn't mine - or at least the people that I work with seem to think it's as much theirs as mine.

Boundaries. Plain and simple.

If I go to talk to someone in their home, I don't re-arrange the furniture because I think it looks better one way than another. I don't even tell them I think they could do better with the couch, or chair in this location. I am a gracious guest and keep my thoughts to myself. After all, I could be totally missing the point of the sunlight in the room, or a heating vent, or any of a hundred different things that would make the choice they made the right choice.

At work it's the same thing. I don't walk into a co-worker's cube and decide that their phone is in the wrong place, or their papers should be stacked here. I try not to even go into the cube itself. They're barely big enough for a person in a chair, there's no need to walk into these tiny places and start poking, moving, and disturbing things.

But there are many that don't believe as I do.

There are people that simply walk into my cube, decide that they need to grab a pad of paper, borrow a pencil and start writing. They don't even ask. While I understand that they don't personally mean to piss me off, they're doing a good job of it and I want to tell them Hey, jerk! Do I go into your house and move your stuff? Take what I want without asking? But I can't. If I do that I'll be branded a trouble-maker, and while I wouldn't mind this branding, I know that there are folks that will be very glad that I keep quiet and simply put things back and get another pad of paper.

But I wish they'd just stay out of my bloody cube!

Adding Identifiers to the Server

Thursday, January 3rd, 2008

servers.jpg

Today I put the finishing touches on the addition of Sedol, ISIN, and CUSIP identifiers to the server. While this isn't rocket science, it's nice that they were added so easily. All I needed to do was to pull it out from the instrument master database a little bit of code made it variables and sent it to the clients.

The Hong Kong users wanted one, but I knew that the coverage of these identifiers is not 100% on any one identifier set, so it was nice that all three were available in the instrument master. I've worked hard to make sure that I can phase things in without having to make a "big switch" where multiple systems are changing on the same day - today I put the changes in the server, tomorrow I can roll-out the changes to the middle-tier and clients. Nice.

Interesting Year-End Changes

Thursday, January 3rd, 2008

There seems to always be something waiting in the wings. For the last few days I've been noticing a problem with one of my apps and while I was confident I hadn't made any changes, it was possible that it was the exchanges, or the software that I use to pseudo-normalize the data from the exchanges. It's only happened in 2008, so I sent an email to my contact in the group who maintains the code I use and asked him if anything changed.

Interestingly enough, things had. But he didn't find them right away. Eventually, he did see that the new backup policy they implemented in 2008 wasn't properly returning the configurations for the users. This was the source of the problems I'd been seeing.

Great that he found the problem and fixed it, but I'm wondering how many other users of this code were getting bad results and just not knowing about it. It's always important to watch what you're running.

Heck of a First Day Back

Wednesday, January 2nd, 2008

cubeLifeView.gif

It's been all I had hoped today would be - busy with lots of things to do, no time to sit around and wonder, and all good things to deliver to the users. I've gotten reports today of data issues that I've fixed in the market data server as well as things I noticed yesterday evening when I got a night-time call from the Hong Kong folks about F/X rates not being properly set. There were a lot of things to overcome - most of them non-obvious to the users, but it was a lot of fun hammering at them until they cracked open, and were solved.

Never give up. Never surrender. - great line from a very funny movie, and it's the way to approach these problems. You know you're going to figure it out, so let's just do it! In the end, it's been a great day. At this point I'm simply waiting on people to get back to me and waiting for others to do things for me. Not a lot I can do until they come through for me, but it's been a heck of a good day up 'til now.

Don't get me wrong, there were a lot of things that greeted me upon my return that weren't really pleasant. For instance, the network problems I've been having have simply been moved to another machine - it's clearly the wiring or the switch, but it's still taking one of my machines down a lot more than it should. Thankfully, I can control which machine this is and that's making it a lot more reasonable, but it's still a pain after all I've been through on this.

Also, there are problems that the users have been telling others about for a while and no one has mentioned them to me. I can't fix what I don't know is a problem. Simple as that. WHen I found out about it, I was able to turn out fixes for the problems pretty quickly, but there are limitations to the data sources, and even I can't get blood from a turnip.

So all in all, it's been a really good day.