Archive for the ‘Coding’ Category

Refactoring Out the TBB concurrent_vector

Wednesday, February 29th, 2012

bug.gif

This morning I came in to see that some of the exchange feeds on one of the staging boxes of mine hadn't shut down properly. When the exchange test data flooded in, it made a mess, and that was no good at all. The only code that seemed to matter was a simple iterator on the TBB concurrent_vector. I've had issues with this code before - and always moved away from it in favor of a simple std::vector and a mutex of some sort. Here was another case of the exact same thing.

Now I'm not saying that the concurrent_vector is a mess, but I think that it, along with the concurrent_map are a little trickier than normal to work with. The iterators have built-in locks, and that makes it very easy to write dodgey code. I think that's what happened, but I can't prove it.

Far easier to use a simple std::vector and then a TBB spin_rw_mutex_v3 to protect it. Virtually all the access to the vector is read-only, there's only really one method that adds to it, and another that removes from it. Those are easy write locks, and happen on start up and shutdown. Easy.

The rest of the time, the r/w mutex will be essentially a no-op, and that's fine with me. The refactoring was easy because all the same vector operations are the same, and most (say 80%) of the use cases are simple iterators on the vector's contents. All I needed to do was to put the scoped locks in the right place, and we're ready to go.

In the end, this is just as clean, probably faster, and a lot more well-understood. Good move.

Tracking Down a Tricky Problem

Tuesday, February 28th, 2012

bug.gif

I just finished spending a good hour tracking down a nasty little problem with the logic I had for creating new instruments on the fly. The problem turned out to really be me, and my preconceived notions about what the problem really was, but that's typically the case. The underlying problem was that I was thinking that the first new message for an instrument wasn't creating the underlying, but in fact, it was. That explained why I was seeing no errors.

No… the real problem was that I wasn't properly handling the case when I found it. It was made, but then the next time, I tried to find it, and it was missing - or so the code thought. In reality, I had failed to really detect that I'd found it, and act accordingly.

It's almost a coding standard in my mind now - For every 'if' statement, there had better be an 'else' clause. It would have saved me this headache, and when I saw it, it was clear that I was missing the else, and what to put in it when the value wasn't NULL.

Glad that's over. It was painful.

Refactoring Like a Bandit to Fix a Bug

Tuesday, February 28th, 2012

bug.gif

This morning I noticed that I had a problem with the initial volatilities for the options in my Greek Engine. Because the users want me to carry over the calculated values from yesterday's close to this morning, you can end up with a really odd situation: the job that computed the volatilities could have changed their values overnight, and now the new volatilities are different than the old. We can't replace the old with the new, as that would make the calculated results look bad. We can't ignore the new, and stick with the old (but that's just where we were doing).

What we needed to do was to load up the new, and leave the old as an output value of the calculations - just like the quote and spot values. This meant that I needed to refactor a good chunk of code and place a new ivar in the Instrument - the volatility, right next to the historical volatility, I then converted them from double values to uint32_t so they are handled a lot easier, and then put in the setters and getters that allowed me to update them as needed - even from the StaticData object that's reading in updates from the database.

All told, it was a good chunk of code in three major classes, but when I built it all and ran it, everything worked as you'd expect. Now there's an "output vol" and the instrument vol, so you can see when they will be different, but the client get the old value until the new value is "active" with a calculation.

It's clean, and I like it a lot more than what I had. I'm just sad it took me this long to find it.

It’s Hard for Me to Know When to Draw the Line

Monday, February 27th, 2012

cubeLifeView.gif

Today has been a really hectic day of a lot of issues in the testing brought up by someone that's a decent guy - kinda like a beer-drinking frat boy - lovable, but you'd never want him dating your sister, but ultimately, pretty useless. I'm getting partial sentences from him about bugs, he's clearly very frustrated with the process, and I believe he's closed himself off from learning another thing about this system. It's funny… the same things that made him a useful tester - able to find bugs because he gave no thought to what he was doing, is really his personal undoing. He's really frustrated. It shows.

I'm trying to cut him slack. I know he's capable of doing more than he is, but at the same time, every time to acts like an angry frat boy, it's hard to have patience for him. Really hard.

I have said many times - "This is hard. I know it. It's hard, but you can do it." only to pump him up enough to get through the next 15 mins and then have him come back to earth crashing even harder than before. He seems to have no patience for the learning process, or at least no interest in what it takes to learn in a place like this. There isn't time to spend several hours with him and take him back to programming basics. He's got a little of the basics, but not enough, and he wants to know more, but he's got no foundation to base it on.

It's not easy. This is clearly over his head, and he's being given an opportunity to move out of the simple QA role, but it's up to him. And in my way of looking, he's not making it. But it's not because of his ability - or lack of it, it's his attitude. He gets angry as I try to explain something to him. I can see he's angry, and I ask him if he's interested in listening. He says "No, I hate this", and walks off.

OK, choice made, ignorance retained. It's his choice.

But at some point, I simply have no more patience for this. I just don't. But it's hard for me to know when to draw the line. I know people that would have had stern words with him already. It's a zero tolerance policy for them when it comes to willful ignorance. But to me, I don't want to make it harder on him than it already is. I'm hoping that when he has the patience, he'll listen, and it'll sink in. But I'm beginning to have my doubts.

In the end, I don't know that it'll matter. In the end, I think he'll self-select and that will be that. It's his choice, after all.

Boost Shared Pointers to the Rescue!

Monday, February 27th, 2012

Boost C++ Libraries

Once again, I have found a perfect use for the boost::shared_ptr, and it's saved me a tons of grief. I've been working to refactor the exchange feed recorders and as I've been doing this, I starting getting stability problems in my StringPool class. Basically, I have a simple class that has an alloc() method that returns a (std::string *), and then allows you to recycle them when you are done with them. It's used in the exchange feeds, but I've been having issues when moving to the new format of append writing in the recorders.

So what to do?

Well… really, the problem is simple. I have a buffer that I fill, and rather than passing that to a write thread, and getting another, why don't we create a copy of what we have, clear out what we're using and just start over? The copy operation isn't bad, and if we use the boost::shared_ptr, we don't have to worry about it going out of scope on me, and it's easy to pass into the thread.

It's just about as clean as I can imagine. Simple. Clean. Get rid of the StringPool, have just a std::string and then when ready to fire off the write, make a new string smart pointer and use it. Sweet.

  block->pack(buff);
  if ((buff.size() >= HIGH_WATER_MARK) ||
      ((block->when > (lastSaved + saveInt)) && (buff.size() > 0))) {
    // grab the last saved time for the next interval
    lastSaved = block->when;
    // get the timestamp for the Beginning Of Buffer…
    uint64_t    bob;
    memcpy(&bob, buff.data(), sizeof(bob));
    // now let's fire off the thread and write this out…
    boost::shared_ptr<std::string>  done(new std::string(buff));
    boost::thread  go = boost::thread(&UDPExchangeRecorder::write, this,
                                      bob, block->when, done,
                                      isPreferred(aConnection));
    go.detach();
    // clear out the buffer that we're using…
    buff.clear();
  }

and then in the write method, it's very easy to use:

  void UDPExchangeRecorder::write( uint64_t aStartTime, uint64_t anEndTime,
                                   boost::shared_ptr<std::string> aBuffer,
                                   bool aMaster )
  {
    // make sure we have something to do…
    if (aBuffer->empty()) {
      return;
    }
    ...
  }

When the write method is done, the shared pointers will be dropped, and the memory freed. Easy, clean, and very stable. This cleared up all my issues.

Did a Lot of Code Cleanup Today

Friday, February 24th, 2012

Code Clean Up

Today I spent a good bit of time going through a co-worker's code and cleaning it up to be something that I'm OK with in the code base of the project. It's something that I'm used to doing, and while some will think it's the ultimate in micro-management, it's really not. I'm not asking him to do it - I'm the one doing all the work. I hope that he takes just a minute or two to look at what I've done and learn from it, but that's totally optional on his part. I can hope he'll do it, but I'm not planning on him doing it.

But I simply cannot leave this code in as-is. It's just starting to go into production, and to leave poorly designed, poorly commented, and code missing the coding standards at this point in time is just giving into the worst of entropy in this project. I have to hold it together as long as I can because there will come a day I have to leave it, and then I can do nothing to prevent this kind of slide.

It's not bad, as a job, it's just something you have to get in the right frame of mind to do.

Design By Committee Never Works

Friday, February 24th, 2012

cubeLifeView.gif

It's sad that I don't have a lot of nice things to say about work these days. Very sad. And one of the very saddest things is that I find myself in this current mode of Design by Committee, and it's just crazy. The problem really originates with the idea that the Big Boss wants to make a group of highly-skilled, high-power developers, that can work together and get things done. This model is very anti-committee of any kind. It's almost the best of the Cowboy coder. It's good people making good decisions, communicating when they need to, for what they need, but not wasting any time.

It's a dream job, to me. And they sold me on it.

But it's not come to pass. Rather, it was close, but we've drifted so far away from that in a few short period of time that it's like it was a distant memory. And what I'm living now is as bad a place of micromanagement as I can remember being.

So we have the users - several different groups. And they all are competing with each other to get things done. This was, and is, very inefficient, and so to solve that, the business put one guy in charge, and all business requests go through him. It's his job to make sure that the different business groups are on-board. He's the one guy we need to go to to get answers. And unfortunately, he's not checking in with some of the groups.

This is brought to my attention by my manager, who used to run the tech for one of these other groups. He's a nice guy, but he's got some views on how to run projects that I find more than a little stifling, and while I've tried to talk to him, I've given up of late, as it's just not doing any good.

So we have communication problems. We have misrepresentation of users' needs due to that. We have poor management styles. We have bad testing procedures. In short, the only thing I can think that we're doing right is… OK… give me a sec… Hmmm… well… I can't think of a thing we're doing right. All that's going right is being done in spite of this place.

And if I had to point to one thing - it's the communication. It's so bad, nothing really has a chance. Holy Cow!

Google Chrome dev 19.0.1049.3 is Out

Friday, February 24th, 2012

This morning I saw that Google Chrome dev 19.0.1049.3 was out, and with it, a few fixes, a new V8 javascript engine, and a few Mac GPU fixes that I'm not sure effect me, but it certainly can't hurt. I'm always glad to see the progress.

Numbers is an Amazing Tool

Thursday, February 23rd, 2012

Today I've been tracking the disk space used by my feed recorders with the mis-configured drive arrays. It's important that I don't blow out the drives before I move the software into production because this data is very important to the high-frequency trading operations, and we have to have at least one reliable copy of the data.

So I tracked the disk usage for the three boxes: prod, dev, and backup prod. The first, prod, was running the old code that had no issues, the other two were running the new code that had issues with the disk space. But, after the reconfiguration, it appears to have been solved. I then set up crontabs on each box to send me the output of df -k, and then I started tracking it.

I put it all in Numbers, and here you have it:

ArcSvr Disk Usage

It's just an amazing tool. Sure, you can do all this in Excel, but that's not with the Mac style and that's just as big a part of this as the data. It's just plain fun!

Fantastic Side-Effect of Auto-Flipping Feeds

Thursday, February 23rd, 2012

High-Tech Greek Engine

Today I was running my Greek Engine, and with the fixes I'd made yesterday for the auto-flipping, I was totally surprised to see the feeds oscillating! Today, we have both A and B sides, so for the first time since the auto-flipping code worked, there was a situation where it shouldn't flip, but it was. It took me about a half second to realize what was happening, and to start laughing. I was being too efficient on the flipping, and when I'd drain a side, I'd quickly see that the other side had something to process, and I'd flip right over.

Then the other side would get ahead, because the non-preferred side is doing less work, and we'd drain the preferred side and flip again. It was an oscillation that was just no good. So I started tuning the code. The initial flipping thresholds were just too easy to hit. So I started by increasing the time a side must be empty. I did several tests, and finally settled on a decent value that has the side clear for about 15 seconds. If a side hasn't received a message in 15 sec, and there's other messages to process, then flip. That seemed reasonable.

I also realized that comparing the combined datagram queue and pending decoded message queues of the non-preferred side to the decoded messages of the preferred wasn't really fair. So in order to do an apples-to-apples comparison, I changed the size comparison to be the size of the pending decoded messages and I think that's going to help a little, but not much.

In the end, I had something that no longer oscillated - but it did do something quite unexpected: It "found" the faster side! It took me a minute to realize what I was seeing, but then it was obvious: if one side can really deliver messages faster than the other, and we're getting consistently empty on the preferred side, but messages on the non-preferred, then we'd be a lot better off flipping, and using the faster side.

Now this speed can come from a lot of things, but most likely it's just the electrical path, or the bandwidth of routes and switches in the path, but one thing is certain, there are very few times that the two sides are identically equal. What this change has done is to make sure that when there's a significant difference, we'll find it, and use it.

It's stuff like this that makes me glad I'm doing this kind of work. Finding unexpected benefits of a code change is really great. Too often it's the other way, but today I came out a winner. Excellent!