Archive for the ‘Coding’ Category

The Victim of Bad Device Drivers

Tuesday, February 7th, 2012

bug.gif

I've been trying to deal with a few dodgey disk array for a few weeks. This was a consequence of the recent floods in Thailand and we were unable to get the high-capacity drives to make the 2TB array for the server, so they pressed an old email server's drive array into use, and it's been a bit dodgey to say the least.

To be fair, I'm glad we had the old array to press into service. If I had been forced to wait for the estimated 3 months, that would certainly have been worse. But I still have to say that bad device drivers are a pain, and I would really like them fixed.

So here's what's been happening… I come in in the morning and I see the mount point for this drive array is there in the filesystem, but all the jobs referencing it are failing. Wonderful. So I try to take a look at it:

  $ cd /plogs/Engine/dumps
  $ ls
  ls: cannot open directory .: Input/output error

No amount of un-mounting and re-mounting will work as the OS simply cannot see the drive array. We have to reboot the box and then it comes back online.

The problem with this approach is that I've got a ton of exchange feed recorders running on this box, and it's the only backup we have to production. If we miss recording one of these feeds, then it's gone as the exchanges aren't in the business of replaying their entire day just because we had a hardware problem.

So I'm trying to get a few things done - the first is get a real backup to the recorders in a second datacenter. The second is getting this drive array working properly on Ubuntu 10, hopefully with a kernel update that's in the offing. It is a decent array. I like it. But it's got to work first, and then I'll be happy.

Finished the Sync Start to the Greek Engine

Tuesday, February 7th, 2012

High-Tech Greek Engine

This afternoon I've put the final touches on the sync start to the greek engine. Basically, when we restart the greek engine, it's possible that we are going to miss messages from the exchange because we're down/restarting. This option allow the app to recognize that it might have missed messages, and hit the archive server and ask it for any possible messages for that time frame. If there are some, we'll work them into the message stream.

It's a very nice feature to have as it means that a mid-day crash or reboot is not going to loose anything. But it's a huge load on the archive server, and it's really not been hit all that hard, so the testing is going to be a really big part of this. Fair enough, it's going to take some time to work out the kinks, but at least now we have what we think we need, and it's up to testing to either confirm or deny those beliefs.

It'll be nice to get this tested and into the main codebase. It's been a ton of work to get all the pieces working and working well.

Scraping Logs vs. Exposed Stats APIs

Tuesday, February 7th, 2012

Ringmaster

I spent the morning today exposing another Broker service for my greeks engine - this one for stats on the running process. In the last few days, the operations folks, who have had months to decide what support tools they need, have put a halt on the deployment to production of my greek engine because they now need to have these stats for monitoring. Currently, they are running a script on the output of an IRC bot that's hitting the engine, but that parser bot depends on getting data in a specific format, and that's brittle, and doesn't allow us to expand the logging on IRC. So I built the better solution this morning.

It's all based on maps of maps, and I just put the data in what I felt made sense. It's organized by feeds and then the general engine, and within feeds, there are the stock feeds and the option feeds, and so on until you get all the data as values of leaf nodes in the maps. It's pretty simple, the only real issue was that there were several metrics that they wanted to see that I hadn't put in the code, and the person that had failed to make proper getters for the data, which meant that I had to make those before I could get at the data.

Not bad, but it took time.

The testing went really well, and they should be able to gather the stats they want at their convenience. Not bad.

As a personal aside, it really makes me wonder why it is that this is coming up right now, and why it's a show-stopper? I mean if it's a show-stopper, why wasn't it stated months ago at the beginning of testing? I think the reality is that it's not that critical, but the folks are starting to panic a bit, and are looking for the usual suspects to slow things down, or try to make this new system fit the same mold as the previous one.

It's kinda disappointing.

Smartest Way to Speed Up: Just Do Less

Monday, February 6th, 2012

High-Tech Greek Engine

Today I spent the vast majority of my day today trying to make this one client application of my greek engine a lot faster. I mean a lot faster. Friday afternoon, I was running some tests on this usage pattern, and realized that the client really was seeing some massive delays in getting data from my engine when dealing with very large, very active families. Using SPY as the example, there are some 2500 derivatives on SPY, and calculating their data and returning it to the caller was taking from 1800 to 2200 msec. That's a long time. The problem was magnified because all they wanted was three of the 2500 options, and they had to wait for all 2500.

Not good.

So Friday I jotted down a few ideas to try today and spent the first few hours doing just that. Each one was a little better, but I was still looking at 1300 msec, and that's just too long. I needed to chop out an order of magnitude or two. So I started doing the profiling. What was it that was taking so long?

Well… it's the calculations. That's no surprise, but it's a real bottleneck too. We can't really afford to make the calculations tie up multiple threads. That'd kill the box with some 50 clients each needing multiple threads for their calks. Not good. I tried to look at other things, but in the end, it always came back to the calculations.

Along the way, however, I did come up with a few really fun optimizations. I was able to look at a continually updating profile of the instrument and use those values to 'seed' the request, but the updates from the market were just so frequent, it was impossible to stay ahead of the updates. It was a real problem.

So I did what I should have done first - go and talk to the coders writing the client app.

I found out that all they really wanted were the implied vols and they only wanted two or three options in each call. Well… now that's very interesting. That's a use-case that I hadn't expected. The reason it's very interesting is that the implied vols can be calculated independently of each other, which means that by telling me you're interested in only the implied vol calculations, I can look at the three options you're asking for, and calculate just them. Sweet.

I had to work into the API the idea of the type of calculation, but we had something pretty much like that already in the API - it just needed a simple extension. And then I had to get the different type handled in the code. In the end, it wasn't too bad, and the time savings were amazing!

The 1800 msec went to 20 msec. That's something that's more than fast enough for what we need. All because I listened to what the client specifically needed. Simple way to be faster? Just do less.

Excellent.

Updated Git to 1.7.8.4 on My MacBook Pro

Monday, February 6th, 2012

gitLogo.gif

This morning I thought that git on my MacBook Pro might be a little behind the times. I don't honestly think there's a huge difference from 1.7.4.1 to 1.7.8.4, but you never know, and it's simple using the Mac OS X installer. Just download it, double-click, and it's ready to go.

It's nice to see:

  $ git --version
  git version 1.7.8.4

Nice. Love it when things "just work".

Interesting… I just noticed that Mac OS X 10.7.3 comes with git - and it makes perfect sense that it does. Xcode uses git now, and so it'd require that the OS - at least the developer tools, would have to have it. So it's not necessary for me to worry about updating this any more. It's nice to have a secondary source, should Apple decide to drop it's support, but I'm guessing that's not going to happen anytime soon.

Interesting stuff…

Exchange Timezones Hit Me Again

Friday, February 3rd, 2012

bug.gif

This morning it was brought to my attention that my ticker plants were showing the open on VIX, quoted out of the CBOE, as something different than the legacy feeds were showing. All my different feeds (dev, staging and prod) showed the same number, so I let the QA guy find out what the problem was. I had no idea where it was coming from. I wasn't even sure I was wrong.

So he asked one of the legacy developers, and sure enough, his numbers matched Yahoo! Finance. So it looked like there was something to this. So I started looking into the trade feed. Thankfully, I have a nice, stable, feed recorder and query service already going, so it was just a matter of giving it the right parameters and it would pull up the files, uncompress them, decode them, search them, and deliver me the results.

I had to admit, this isn't the first time I'd wished I had a web interface for this, but alas, I don't.

So I looked at the feed, and realized that at 8:30 am, the trades arrived, but they were not marked as 'valid' trades. This is odd because all Index "trades" are valid if they arrive after the open. And it was, after all, after the open.

So I looked at the code - and sure enough, there was the problem. The CBOE is the one exchange I listen to that's located in CST as opposed to EST. That means that the time of the "open" for the CBOE is 8:30 am, and not 9:30 am, like the NYSE, PHLX, etc. This is something I'd planned for, but hadn't remembered to use in this particular exchange codec.

The fix was simple - use the CST open, and all would be fine. Unfortunately, that means that the data for today for the indexes from CBOE is messed up, but at least it'll be right for Monday. Just all the little data things that need to be fixed up… it's getting to be fewer, but I'm sure there's still a lot to find.

Google Chrome dev 18.0.1025.3 is Out

Friday, February 3rd, 2012

Well… it's only been a few days since 18.0.1025.1 was released, but I guess the Google Chrome team realized that there were a few outstanding issues that warranted a new release and a few ticks in the version number. Nice that they all appear to be fixes for crashing bugs… way to keep on the crashers, guys.

This is Why Codecs Have to be Strictly Controlled

Thursday, February 2nd, 2012

bug.gif

OK… so I'm working on some nice little features on the greek engine today - adding a few nice IRC commands to the system, and making things just a little bit nicer for the support staff, and we start getting these odd problems. In some cases, the response time from the engine is wildly varying, and in other cases, the memory footprint is far too big. All very odd, seemingly unrelated, but all timed to happen today.

So I started looking at yet another problem - one of the clients to the engine is sending a Close Channel message when it wasn't needed. That, in and of itself, is not the problem, but on closer inspection, the contents of the message are alarming:

[asyncRead] a close channel for unknown channel was received
[asyncRead] 58 af cb a6 3b 49 db 41 03 84 4d 11 c4 5e 55 d1 05 45 12
            X…;I.A..M..^U..E.

the sec on line of the error message is the binary contents of the close channel message, which should contain an 'X', followed by the 16-byte channel ID, followed by an encoded variant value. In this case, the 'E' means it is an error, and by definition that means that a varint-encoded number follows, and after that, another variant that is the "reason'. The value after the 'E' is intended to be the numeric error code, and the variant is meant to hold the message or messages that accompany it.

But as you can see, there's the 'E', and a varint-encoded value, and then nothing. In my decoder, I look at the next byte and try to decode it. If that happens to be a String, or a Map, I can go off into lala land and decode a GB or two. Not good.

The solution? Well… there's two: we have to get the app (or app writer) that generates this malformed error to correct their mistake, and until this person can be identified, we have to protect our decoder against this kind of problem and put in a simple test:

  if (aPos < aCode.size()) {
    mErrorValue->get<1>().deserialize(aCode, aPos);
  }

The real problem with this is that someone created a codec that encodes data improperly. To what extent? I have no idea, but this kind of things is capable of bring down a whole lot of servers and clients. I'm lucky it's not been a lot worse. But it underscores the need to have a group of people that control these critical components, and not allow just "anyone" to fiddle with them. The risk and consequences are just too great.

I wish I had faith that this will be the catalyst to stop all this, but I have serious doubts about it. I'm certainly going to try to get it to create some change. It needs to happen, and it needs to happen now.

Bloomberg Gets a Pretty New Face

Thursday, February 2nd, 2012

GeneralDev.jpg

Yesterday, I heard about Bloomberg's new Open API initiative. It's a new .Net, C++, Java, and C API that is "Open" for all to use and make use of. The catch is that all the data you'd want to get is really still exceptionally expensive, but that's Bloomberg, eh? The last time I used a Bloomberg API it was the Bloomberg Server API, which was a mild modification on the old Bloomberg Terminal API that came with every Bloomberg Terminal - Windows and Solaris, going far, far back into the past.

I've just briefly scanned these docs, and it's a new API alright. Much easier to deal with, and hopefully far easier to decode the data once it's returned from Bloomberg. I like that they are trying to really make it easier to use - both in the pub/sub and the req/resp modes. It's an improvement.

Heck, almost anything is an improvement.

Still, the kicker is the cost of the data. When last I looked, it was still some of the most expensive data around. I mean outta sight prices. I don't think it's gone down in the last two years, but I could be wrong.

Yet I can't blame them. They have a nice gig - they have a great reputation on the street for their data, and so they can charge a ton and use that to keep away the riffraff. It's working for them, and who am I to give them grief. Sure… I'd love to build a system off this for the Mac and build in all the bells and whistles, but that's a really hard sell as the data is so expensive and all the online brokerages are giving their data away - with decent tools.

Still… if I hit the lotto, I'm all over this.

A Letter to a Dear Friend

Thursday, February 2nd, 2012

This morning I was thinking about the particular situation I find myself in at work. Interestingly enough, the one guy that I thought could really give me great advice is one of my oldest friends - Bret from grad school. I've known Bret since 1980 - that's more than 31 years now. We've worked together, laughed together, and lived a long time together.

To this morning, I wrote to him to ask him his advice:

I've been struggling here at work for the last few months - amid some massive re-orgs (yes, multiple massive re-orgs in that time), and in the midst of all this, I thought of the one person that I could really trust to give me some solid advice - you.

So here's what I'm struggling with: When I hired on here at The Shop about 2 yrs ago it was all about who I was going to be working with, and how we were going to be developing, and no more crap for HR… all the things that after a long stint at First Chicago, then UBS, I was happy to hear. It started out great, and my manager was just made partner, so it seemed like it was going to be great for a long time.

Then things changed. My manager, Clive, was put in charge of all IT for The Shop. Everything. And it's changed Clive. We no longer work together. For a while, I found someone that reminded me a lot of you - funny, easy to laugh, good coder, thoughtful. A really nice guy to work with. And while it was a little team of the two of us, it was great.

Then Clive decided that his view of IT needed to change, and that guy, is now managing the group I'm in - a group of 14 people.

Out the window goes the "who" I work with. Now I'm working with regular (which is to say, junior) guys that are dolts in comparison.

Out the window goes the "how" I work. Now things can't be released unless we have a meeting about it and it'e perfectly acceptable to leave bugs in production until that time. There are times they will have to check to see if it's OK to fix a bug - priorities are important, after all.

Out the window goes everything that I once liked about this place.

And so I'm asking you: How do you do it?

How do you work with people, systems, organizations, etc. that are clearly more like Roman galleys than places for creative people to work. It's not that I mind hard work, it's the conditions under which it's produced. Maybe I'm just fooling myself that a place like this Shangri-La even exists, but I'd like to think it does. But maybe that's my problem.

Maybe I need to just accept that people that want my effort, my energy, my work really aren't interested in my best work - they would be happy with 80% - if they get to choose the terms under which it's given.

Anyway, I'm hoping that you have some words of advice for me. Something that I can use to re-adjust my thinking, to re-align my sights - to get to a place that I don't dread coming to work.

Anything you have would be really helpful.

I'm hoping he's got some good advice for me. Stay tuned.

[2/13] UPDATE: I wasn't disappointed… his letter was right on target and it got me to thinking about what I need to do:

Hmmm, well, I think I should tell you a story. This is how my thinking has changed during the last 6 months of my last job. It has to do with all that's happened before but took a form I could articulate last year.

I started working for Avocent in 2008. It was a new team building a pretty cool product. Long story short, it was the best team I'd ever been part of. Best is terms of mutual respect, fun, and actual quality and quantity of output. Then we were bought buy a much bigger company. Things changed like black and white. One day when I was thinking about my options a light bulb went off. Every job I've ever had started out hopeful and for varying lengths of time was pretty rewarding. But something always happened to change that. What I realized was not that things always change. It was that *I* have been wrong every time about my estimation of the longevity of the job. Every time. On that day I made two decisions. Or rather two changes in my thinking. One is that I don't care one wit about the longevity prospects of a job opportunity I'm considering. Everyone tries to sell you and the vast potential of whatever they are selling. Now what I'm about to say will sound harsher than I really think in general (I mean I've not turned into a hopeless cynic, far from it) but to the job salesman I say bullshit. But really it's my desire to assume more than I should that I call bullshit on. Here's the deal. I've been wrong EVERY time. It's not that I didn't have educated assumptions, I believe I did. Doesn't matter. There are too many factors that can change. I NEVER saw the purchase coming by a company that was both large and insane at the same time. So, to be clear, I'm not jaded, I just don't consider longevity to be a factor. I just want to know if the work is interesting. If things change I'll look again. But I said I made two decisions. The second I'm still working out in real life. Since I can't count on others for long term job satisfaction, my goal has changed. I used to want to find a job that was "interesting" (there are many dimension to what "interesting"means). What I realized is the reality that I could continue this path of going from job to job (really meaning from employer to employer) as things change, to I could seek to become independent of that rat race. The best word I have for what my goal is right now is independence. There are just way too many ways today to make your own path and divorce yourself from the work you want to do and a bunch of other factors (where you live, who you work with, etc.).

I guess in answer to your question of how I do it, I don't think I do really. I've always moved on. That takes time sometimes, but the mental switch flips pretty easy and hasn't ever flipped back. In the meantime, be yourself, advocate the quality you expect. THAT is hard and I've failed many times but that's the standard to measure against. Remaining true, that is. This has been a bit of a ramble. There's probably more to say so feel free to call anytime. I mean it. I'm living this out everyday right now so talking this stuff through would be helpful to me too. It's been good for me to reflect on this as I've typed this much to you so far.

Take care and let me know how things go.

He's dead right, and I knew it before he even wrote back. The problem is me and my expectations. I need to lower them. Way, way, lower. When I was new here, and had lower expectations, things were a lot better, but as I started doing more work here, they rose on the hopes that things were really going to be great. Big mistake of mine.

Focus on the things that are important to me. That's the ticket. It's not important that I'm a convert to the cause, I just need to be a solid, good, hard worker, and that's always going to happen. It's when I think they have the same vision as I do that things go sour. I just need to keep a respectful distance. It's not easy for me, but it's important.

Thanks, old friend. I knew I could count on you!