Archive for February, 2012

Added Postgres Failover Code to Greek Engine

Wednesday, February 29th, 2012

High-Tech Greek Engine

One of my favorite things is to work with databases in code. Persistence and database hits are a blast as they get you a place to save stuff that, if you design it right, you can view from just about any tool on the planet. Can't say the same for redis or mongoDB. My Greek Engine gets it's instrument data from a local replica copy of a master postgres database, and should the local copy fail - or be down, it should auto-reconnect to the master and just function off that one. If he's dead… well… that's when it's time to get serious about getting things working.

The first thing I needed to do was to consolidate all the database activity to as few a number of places as possible. Thankfully, I had a simple execute() method that did about 90% of what I needed. It just took a few minutes to make that the only way to hit the database, and then I could focus on making that a little more fault-tolerant.

The idea is simple, really: put it in a retry loop, limit the number of retries, and then for each retry, hit The Broker for the correct database connection parameters to use. If the Broker is wrong, then I'm in real trouble, but it's not, so I'm OK. (Famous last words.)

Add a little logging, remove some error codes, and we're ready to go. It really didn't take me all that long, and the results are much better. When, and if, the database goes down, we'll fail over to the master. When we get the local copy up, we can issue an IRC command to repeat the process, and the local one will again be used. Simple. Clean.

Great.

Refactoring Out the TBB concurrent_vector

Wednesday, February 29th, 2012

bug.gif

This morning I came in to see that some of the exchange feeds on one of the staging boxes of mine hadn't shut down properly. When the exchange test data flooded in, it made a mess, and that was no good at all. The only code that seemed to matter was a simple iterator on the TBB concurrent_vector. I've had issues with this code before - and always moved away from it in favor of a simple std::vector and a mutex of some sort. Here was another case of the exact same thing.

Now I'm not saying that the concurrent_vector is a mess, but I think that it, along with the concurrent_map are a little trickier than normal to work with. The iterators have built-in locks, and that makes it very easy to write dodgey code. I think that's what happened, but I can't prove it.

Far easier to use a simple std::vector and then a TBB spin_rw_mutex_v3 to protect it. Virtually all the access to the vector is read-only, there's only really one method that adds to it, and another that removes from it. Those are easy write locks, and happen on start up and shutdown. Easy.

The rest of the time, the r/w mutex will be essentially a no-op, and that's fine with me. The refactoring was easy because all the same vector operations are the same, and most (say 80%) of the use cases are simple iterators on the vector's contents. All I needed to do was to put the scoped locks in the right place, and we're ready to go.

In the end, this is just as clean, probably faster, and a lot more well-understood. Good move.

Tracking Down a Tricky Problem

Tuesday, February 28th, 2012

bug.gif

I just finished spending a good hour tracking down a nasty little problem with the logic I had for creating new instruments on the fly. The problem turned out to really be me, and my preconceived notions about what the problem really was, but that's typically the case. The underlying problem was that I was thinking that the first new message for an instrument wasn't creating the underlying, but in fact, it was. That explained why I was seeing no errors.

No… the real problem was that I wasn't properly handling the case when I found it. It was made, but then the next time, I tried to find it, and it was missing - or so the code thought. In reality, I had failed to really detect that I'd found it, and act accordingly.

It's almost a coding standard in my mind now - For every 'if' statement, there had better be an 'else' clause. It would have saved me this headache, and when I saw it, it was clear that I was missing the else, and what to put in it when the value wasn't NULL.

Glad that's over. It was painful.

Refactoring Like a Bandit to Fix a Bug

Tuesday, February 28th, 2012

bug.gif

This morning I noticed that I had a problem with the initial volatilities for the options in my Greek Engine. Because the users want me to carry over the calculated values from yesterday's close to this morning, you can end up with a really odd situation: the job that computed the volatilities could have changed their values overnight, and now the new volatilities are different than the old. We can't replace the old with the new, as that would make the calculated results look bad. We can't ignore the new, and stick with the old (but that's just where we were doing).

What we needed to do was to load up the new, and leave the old as an output value of the calculations - just like the quote and spot values. This meant that I needed to refactor a good chunk of code and place a new ivar in the Instrument - the volatility, right next to the historical volatility, I then converted them from double values to uint32_t so they are handled a lot easier, and then put in the setters and getters that allowed me to update them as needed - even from the StaticData object that's reading in updates from the database.

All told, it was a good chunk of code in three major classes, but when I built it all and ran it, everything worked as you'd expect. Now there's an "output vol" and the instrument vol, so you can see when they will be different, but the client get the old value until the new value is "active" with a calculation.

It's clean, and I like it a lot more than what I had. I'm just sad it took me this long to find it.

Relocating FileMerge with Xcode 4.3

Tuesday, February 28th, 2012

FileMerge.jpg

Today I was thinking about FileMerge because I was talking to someone about diffing files, and FileMerge is the best tool I've seen for that. But I remembered that with Xcode 4.3, the entire /Developer directory was removed. So where did it go? Seems I had to go on a hunt.

The obvious places were out - /Applications, and that's about it these days with the App Store. So I got serious:

  $ find /Applications -name FileMerge.app
  /Applications/Xcode.app/Contents/Applications/FileMerge.app
  $

What? they put an entire Applications directory under the Xcode.app bundle? Why do that? Well… at least I can get at it and run it. It'd be nice if LaunchBar was able to see inside the app bundles, but I can force that on a custom scan. The next question was the command-line opendiff command.

Turns out, there's a command to "refocus" the location of the Developer Tools:

  $ sudo /usr/bin/xcode-select -switch /Applications/Xcode.app/Contents/Developer

After this, the opendiff command works exactly as you'd expect. Good enough. They were lost (for a bit), but now they are found.

Lost My TimeMachine Drive This Morning

Tuesday, February 28th, 2012

TimeMachine.jpg

This morning I plugged my SimpleTech 1TB ProDrive into my MacBook Pro, and it wasn't there. I took apart the case, there's no loose wires, it's just a dead drive. Sad, but true. It's really sad to me because I've been moving that data from drive to drive since the beginning of TimeMachine from Apple. That's several years of data. All gone. Sad.

To be honest, it's not something that I used all the time. I did use it to restore from a dead laptop drive, and I'll be sure to get another tonight, so I can get it all backed up in the morning, but I didn't really go back in time all that much. I have all my source code in cvs or git, so I'm not losing anything there, but it's still the thought of all that data - now gone.

So I'm looking at the LaCie and the G-Tech drives, and while I like that the LaCie drives are wildly popular on the reviews, I haven't had a lot of luck with them, and I see I'm not alone. There are a lot of folks that this G-Tech isn't all that great, and there was talk of a series of bad drives, but I'm guessing that's over.

It's all going to come down to what's at the Apple Store, but I know they have the LaCie and the G-Tech, but I think I'm leaning towards G-Tech as they have a lot of FireWire 800, and I want to get back to that for the speed, and I've heard several good things from friends about the G-Tech drives. I think it's worth a shot.

I'm very sad that there's no way to get that data off the drive. In very real terms, it's not worth much, but it would have been nice to have replaced the drive last weekend, and gotten all the data off. But that's OK. I got a lot of good out of it, and now it's time to move on.

It’s Hard for Me to Know When to Draw the Line

Monday, February 27th, 2012

cubeLifeView.gif

Today has been a really hectic day of a lot of issues in the testing brought up by someone that's a decent guy - kinda like a beer-drinking frat boy - lovable, but you'd never want him dating your sister, but ultimately, pretty useless. I'm getting partial sentences from him about bugs, he's clearly very frustrated with the process, and I believe he's closed himself off from learning another thing about this system. It's funny… the same things that made him a useful tester - able to find bugs because he gave no thought to what he was doing, is really his personal undoing. He's really frustrated. It shows.

I'm trying to cut him slack. I know he's capable of doing more than he is, but at the same time, every time to acts like an angry frat boy, it's hard to have patience for him. Really hard.

I have said many times - "This is hard. I know it. It's hard, but you can do it." only to pump him up enough to get through the next 15 mins and then have him come back to earth crashing even harder than before. He seems to have no patience for the learning process, or at least no interest in what it takes to learn in a place like this. There isn't time to spend several hours with him and take him back to programming basics. He's got a little of the basics, but not enough, and he wants to know more, but he's got no foundation to base it on.

It's not easy. This is clearly over his head, and he's being given an opportunity to move out of the simple QA role, but it's up to him. And in my way of looking, he's not making it. But it's not because of his ability - or lack of it, it's his attitude. He gets angry as I try to explain something to him. I can see he's angry, and I ask him if he's interested in listening. He says "No, I hate this", and walks off.

OK, choice made, ignorance retained. It's his choice.

But at some point, I simply have no more patience for this. I just don't. But it's hard for me to know when to draw the line. I know people that would have had stern words with him already. It's a zero tolerance policy for them when it comes to willful ignorance. But to me, I don't want to make it harder on him than it already is. I'm hoping that when he has the patience, he'll listen, and it'll sink in. But I'm beginning to have my doubts.

In the end, I don't know that it'll matter. In the end, I think he'll self-select and that will be that. It's his choice, after all.

Boost Shared Pointers to the Rescue!

Monday, February 27th, 2012

Boost C++ Libraries

Once again, I have found a perfect use for the boost::shared_ptr, and it's saved me a tons of grief. I've been working to refactor the exchange feed recorders and as I've been doing this, I starting getting stability problems in my StringPool class. Basically, I have a simple class that has an alloc() method that returns a (std::string *), and then allows you to recycle them when you are done with them. It's used in the exchange feeds, but I've been having issues when moving to the new format of append writing in the recorders.

So what to do?

Well… really, the problem is simple. I have a buffer that I fill, and rather than passing that to a write thread, and getting another, why don't we create a copy of what we have, clear out what we're using and just start over? The copy operation isn't bad, and if we use the boost::shared_ptr, we don't have to worry about it going out of scope on me, and it's easy to pass into the thread.

It's just about as clean as I can imagine. Simple. Clean. Get rid of the StringPool, have just a std::string and then when ready to fire off the write, make a new string smart pointer and use it. Sweet.

  block->pack(buff);
  if ((buff.size() >= HIGH_WATER_MARK) ||
      ((block->when > (lastSaved + saveInt)) && (buff.size() > 0))) {
    // grab the last saved time for the next interval
    lastSaved = block->when;
    // get the timestamp for the Beginning Of Buffer…
    uint64_t    bob;
    memcpy(&bob, buff.data(), sizeof(bob));
    // now let's fire off the thread and write this out…
    boost::shared_ptr<std::string>  done(new std::string(buff));
    boost::thread  go = boost::thread(&UDPExchangeRecorder::write, this,
                                      bob, block->when, done,
                                      isPreferred(aConnection));
    go.detach();
    // clear out the buffer that we're using…
    buff.clear();
  }

and then in the write method, it's very easy to use:

  void UDPExchangeRecorder::write( uint64_t aStartTime, uint64_t anEndTime,
                                   boost::shared_ptr<std::string> aBuffer,
                                   bool aMaster )
  {
    // make sure we have something to do…
    if (aBuffer->empty()) {
      return;
    }
    ...
  }

When the write method is done, the shared pointers will be dropped, and the memory freed. Easy, clean, and very stable. This cleared up all my issues.

Did a Lot of Code Cleanup Today

Friday, February 24th, 2012

Code Clean Up

Today I spent a good bit of time going through a co-worker's code and cleaning it up to be something that I'm OK with in the code base of the project. It's something that I'm used to doing, and while some will think it's the ultimate in micro-management, it's really not. I'm not asking him to do it - I'm the one doing all the work. I hope that he takes just a minute or two to look at what I've done and learn from it, but that's totally optional on his part. I can hope he'll do it, but I'm not planning on him doing it.

But I simply cannot leave this code in as-is. It's just starting to go into production, and to leave poorly designed, poorly commented, and code missing the coding standards at this point in time is just giving into the worst of entropy in this project. I have to hold it together as long as I can because there will come a day I have to leave it, and then I can do nothing to prevent this kind of slide.

It's not bad, as a job, it's just something you have to get in the right frame of mind to do.

Design By Committee Never Works

Friday, February 24th, 2012

cubeLifeView.gif

It's sad that I don't have a lot of nice things to say about work these days. Very sad. And one of the very saddest things is that I find myself in this current mode of Design by Committee, and it's just crazy. The problem really originates with the idea that the Big Boss wants to make a group of highly-skilled, high-power developers, that can work together and get things done. This model is very anti-committee of any kind. It's almost the best of the Cowboy coder. It's good people making good decisions, communicating when they need to, for what they need, but not wasting any time.

It's a dream job, to me. And they sold me on it.

But it's not come to pass. Rather, it was close, but we've drifted so far away from that in a few short period of time that it's like it was a distant memory. And what I'm living now is as bad a place of micromanagement as I can remember being.

So we have the users - several different groups. And they all are competing with each other to get things done. This was, and is, very inefficient, and so to solve that, the business put one guy in charge, and all business requests go through him. It's his job to make sure that the different business groups are on-board. He's the one guy we need to go to to get answers. And unfortunately, he's not checking in with some of the groups.

This is brought to my attention by my manager, who used to run the tech for one of these other groups. He's a nice guy, but he's got some views on how to run projects that I find more than a little stifling, and while I've tried to talk to him, I've given up of late, as it's just not doing any good.

So we have communication problems. We have misrepresentation of users' needs due to that. We have poor management styles. We have bad testing procedures. In short, the only thing I can think that we're doing right is… OK… give me a sec… Hmmm… well… I can't think of a thing we're doing right. All that's going right is being done in spite of this place.

And if I had to point to one thing - it's the communication. It's so bad, nothing really has a chance. Holy Cow!