Archive for November, 2007

Crazy First Day Back

Tuesday, November 20th, 2007

cubeLifeView.gif

Today was the first day back from the long weekend where we went to Philadelphia for Liza's marathon. It was an amazingly crazy day that had more than it's share of issues and problems that I had to overcome. Not a fun day - not by a long shot.

First, I had more of the 'NFS not found' errors on my Sun box. This was supposed to have been cleared up with the new networking and cables, but it wasn't. This was made infinitely worse today as I had problems with the production MarketData server and without my Sun box to develop/debug, I was in the dark. They came and replaced the network card in the box and reconfigured it to use that new NIC as opposed to the one on the motherboard. This took about an hour from the time I started 'yelling' about the problem and it's impact to production, so that's not too bad, but that it was out after having told them to replace the NIC weeks ago is a little troubling. But I hope this is it for this problem.

Next, there was the problem that I needed that box to solve. That was, the database that I get symbol mappings from was not returning data as it should have. If I went into SQSH, the data looked to be there, but when I went in from the program and the SQLAPI++ libraries, it wasn't. After a bit I gave up and pointed my code to the primary database (as opposed to the read-only replicant) and it worked perfectly. I had seen a few database issues this morning early, but thought they were all fixed up. I was wrong.

Later this afternoon, I saw that the read-only database was working again, and I'll switch production back at the end of the day. I try to be a good corporate citizen as much as possible and sing the read-only database is something they want me to do. I'll comply as long as it's working, but when it fails, it's back to the primary as the users have to come first.

After I got that fixed up, I had to fix up problems with my server and the EQSVal libraries. It turns out that with the most recent version of the libraries (v7.3.24b) you have to set things that were once defaults in previous versions. This isn't horrible, and the fix was simple - but figuring out what needed to be done was the problem. Thankfully, the EQSVal guys were helpful and once I generated them a test datafile they were able to tell me the issue and what to do.

After that, I had to make several changes to the web-based editor for my server. Nothing major, but there were some edge conditions that were showing up as records in lists that should not have been there. The perl processing and filtering was about as good as I could do in a single pass, so I had to add a second pass on the data to further clean it up. This will take a little bit more on the execution, but it'll keep the records cleaner and therefore make it less confusing to the users. Good things to do.

Finally, I got the most recent version of a driver from the internal price source folks at the Bank. I needed this because their mapping rules from Reuters to what we needed weren't capable of being exchange-specific enough to really hit the points we needed hit. With this update, my contact Todd is able to get the rules as specific as necessary and that's great. I'm hoping that these work out in test over the next few days so we can roll them out to production as soon as possible. These are tickers that the users have been wanting more smarts in, but haven't been able to deliver due to the limited rules on Todd's side of things.

That's about it. It's been a monster of a day. I'm looking forward to calming down and taking it easy for a bit.

Advanced Hero Support

Monday, November 19th, 2007

Well... Liza ran her second marathon this weekend (in Philadelphia) and did a great job. Things worked out from all the lessons we learned from Chicago - take the water, take the phone, be at specific spots - all worked well. The weather wasn't as nice as it could have been - low 40's and falling throughout the race, drizzle to rain, and a ton of wind, but she did it. Ran every step. Her half-marathon split was good, and she understandably slowed on the second half. Finished in about 4:53 - which is great for a second effort.

Don't know if she's going to be doing more of these, but it's certain that we have the procedure down. It's great to watch her run and great to cheer her on.

Fixing up EQSVal Problems

Thursday, November 15th, 2007

servers.jpg

Today I finally figured out the problem I have been slugging out for a few days. Actually, there were a few - none of them of my creation, but legacy issues that I have to deal with in order to get the server making a little forward progress.

The first was the fact that the Bank's valuation library, EQSVal, deals in dates that are ints and I had assumed that the original author of the conversion code had checked to make sure that the conversion was done properly. After all, the data looked right. In C/C++ the standard for converting dates to ints is to compute the number of days since 1/1/1970 - the epoch for all Unix systems. And that's what the server's conversion code was doing - converting the dates based on that reference date. Interestingly enough, the EQSVal library authors have decided to reference the same date as Microsoft Excel. Why? I can't possibly tell you because it makes no sense to me, but that's not the point, I suppose. The point is that they reference the first day of the last century - 1/1/1900. This made all my dates off by 70 years - some 25,567 days.

I saw this problem and realized that the easiest thing to do was to make a new method on MMDate called daysSince1900() so as not to confuse the other uses of the method toInt() that got the date with respect to the Unix epoch. This fixed the first of the two problems. I'm convinced that when I was trying the latest production version of the EQSVal code I was getting what they classified as operating system errors most likely from doing the reverse of what I was doing - subtracting 25,567 from the '1900' date and then using the Unix tools to then trying to convert it to a Unix time struct. Getting negative numbers might really be a mess, and could easily have been the issue.

So... one down, one to go.

The second problem I found was the quote date for the stock 'products' going into the model. The original code had the driver date - which is the quote date offset by the settle offset for options. This was a typo in the code and was leading to problems in the model. EQSVal checks the volatility curves (volatility, skew and kurtosis) to see that the first point in each curve is not before the quote date for the evaluation. Because the stock product was using the driver date, and the first point in each curve was only a business day away, it was advancing over the start of the curve and it was flagging an error. Most of the time this would not be an issue, but near the expirations this is going to happen as the volatility curves are calculated on these expirations.

With both of these problems solved, the values were calculating properly and no flags were being raised. Good enough. Not easy to find, but I'm glad I stuck with it before I left for the weekend.

UPDATE: it turns out that the EQSVal libraries allow for me to specify the date of 1/1/1970 which they use for offset. In general, then, the difference should not have mattered, had we set the offset right. For now, we'll leave it, but it's nice to know that there's a place we can control the offset itself.

Slugging it Out

Wednesday, November 14th, 2007

cubeLifeView.gif

There are those days that make you love the Job. There are those days that make you hate it. But for the most part, it's just that - a Job, and there's nothing particularly exciting about it and nothing particularly aggravating about it either. Today is one of those days.

I'd rather be doing something other than what I'm doing - which is debugging other group's libraries in my code without their source or support. It has to be done, because there isn't a real alternative, and it's not horrible because I have decent ways around most of the limitations I've found in their code, but it's still just plain old grunt work. Nothing fancy or glamorous, but the bread-n-butter of the developing job.

It's not fast because a significant change requires a restart which takes about 20 mins, but I try to make those significant, and therefore informative. It's going to get solved, it has to get solved. But it's not really something I'm going to put on my resume. It's just the job.

Sometimes You Have to Debug Another System’s Code

Tuesday, November 13th, 2007

bug.gif

Today, actually yesterday evening, a trader had a problem with some of the data my app was showing. Now I don't generate this data - I just display it, but as is often the case, if it's connected with your app, you get at least partial blame for it not being right. I understand this and it's something that, to a large degree, I agree with. The problem comes in where the guys feeding me the data won't (or can't) fix the data coming to me in a timely manner. Then I have to get into two modes: Defensive Systems Development and someone else's debugger.

Today was one of those days.

After coming up with a plan to avoid the problem in the short-term, I set out on the task of making my system capable of filtering out the bad data from the upstream source. It wasn't something I was planning on doing today, but it wasn't bad and allowed me to add another level of flexibility to my flagship middle-tier. Not bad really. But in the middle of trying to get al these changes in for stopping and starting the flow of updates from my main collector/distributor, there were a flurry of emails about the upstream system - a database, and trying to find the problem there so that we stop it at the source.

I'm all for fixing things at the source, but I soon realized that they weren't seeing the problem. So I had to just into the fray and try and figure it out myself. I'll admit it was a tricky little problem in a stored procedure. Basically, a table which originally presumed to have only one day's worth of data in it was having, for a few hours, two days worth of data. This wasn't bad because when they changed to making it have multiple days of data, they changed the stored procedures to look at just one day's data in that table. Or so we thought.

In actuality, it was not looking at just one day's data because the where clause on the defining cursor was not limiting the primary table's data to just one day. This meant that for a time, two day's data was being used in the cursor. This would have been fine, if it not for another bug in the storied procedure that didn't limit an update statement to a single day. Now the bug is evident - two days were being looked at in the cursor, both days were being updated at the same time in the update statement, and one of those days was not really complete.

That meant that the incomplete day was getting used and updating the complete day and that was the bug. Once I figured that out, I told the manager of the project, and when I had convinced him, told him what he had to do to fix it. He then had one of his guys fix the stored procedure and we should be fine.

I still finished my enhancements to the filtering on the data sources because I think it's a good thing to have. But it's probably no longer necessary. We have it fixed at the source and that's going to be all we need.

Getting ctags Working in BBEdit 8.7

Monday, November 12th, 2007

BBEdit.jpg

I was messing with Vim's ctags support today on my Mac and then somehow wondered if either SubEthaEdit or BBEdit had support for ctags like Vim had. I dug into it and in fact BBEdit can support ctags, but unfortunately not the ctags that exist with Mac OS X 10.4 - you need the exuberant ctags on sourceforge.net. They talk extensively about it in Chapter 14 of the BBEdit manual. You need to download, build and install the ctags app:

    cd ctags-5.7
    ./configure --prefix=/usr/local
    make
    sudo make install

In order to have it not conflict with the existing ctags I did the following:

    cd /usr/local/bin
    sudo mv ctags ectags

I also changed the name of the man page to get that matching the new command:

    cd /usr/local/man/man.1
    sudo ctags.1 ectags.1

and to make sure this man page gets into the existing MANPATH - if you don't want to add this location to your MANPATH, you can do:

    cd /usr/local/share/man/man1
    sudo ln -s /usr/local/man/man1/ectags.1 .

Then I created a simple alias that allows me to call it simply with the right arguments for the file BBEdit needs:

    alias ectags `ectags  --excmd=number --tag-relative=no  --fields=+a+m+n+S -R'

So that in the Makefiles for BKit and CKit I can add the target:

    CTAGS = ectags  --excmd=number --tag-relative=no  --fields=+a+m+n+S -R
    ...
    tags:
        @ $(CTAGS) `pwd`/src

And this way I can then automatically pick up the tags in both projects. It's nice that CVS doesn't try to update the file called 'tags', and placing it at the top of the source tree allows BBEdit to find it - as well as Vim, if I'm into that as well.

In the end, this is a really nice little addition. I'm a little surprised that the BBEdit folks didn't include it as a part of the BBEdit distribution so that you would not have to download it and build it. I mean really they know that Mac OS X 10.4 doesn't come with it, and they are going to need it, so I can't quite figure out why they didn't include it. But they didn't. Easy enough to download and build.

UPDATE: I also found that there are 'Jump' and 'Jump Back' menu commands that I've got hot keys set up for. This makes it very easy to jump to a functional definition based on the ctags and then back to where I was, and then back to the definition. Very nice. Gotta love BBEdit for this.

More Vendor Madness – When will it end?

Monday, November 12th, 2007

cubeLifeView.gif

Today we had a meeting with a group of technical sales consultants from a Vendor for a very expensive package that we have purchased and I've been asked to integrate and get working. This meeting was about price feeds into this system, and how the current product's abilities are weak to say the least. They rely on Reuters exclusively, but with a Reuters infrastructure, you (the client of Reuters) can push data back into the RMDS feed, if you so choose. You just become another 'exchange' to Reuters. But that's not a realistic alternative for us.

What we need to do is to get at the Vendor's price feed service and inject prices from my CacheStation based on our rules. These prices are what's necessary - as opposed to what's possible from Reuters, and so I'm not really in favor of simply faking Reuters RMDS with our source - I want a new source that can co-exist with the Reuters feed and then we can configure the application to take certain prices from one feed and other prices from another. Sounds very reasonable.

What's amazing to me is that in this meeting we had close to a dozen people - of which two were really necessary - the lead Vendor representative and myself. Everyone else was at least redundant, and often (as it turned out), painfully in the way. I've worked with the feeds, I know what needs to be done, I understand the basics of the Vendor's price service, and all I need to do is to get them to do a few things to meet me half-way on this and we'll all be fine.

But as it turned out, people that had no idea what was needed or what was happening had to be in this meeting. Why? Politics. What a waste. This is something I hate - don't waste my time if you're going to have people unrelated to the issue blathering on about this or that... get to the point, or confess that you have no idea what is going on and get on with it.

In the end, you'll come back to me - or you won't. If you don't, then you got it figured out without me - Good for you! If you come back to me (as you have time and again), then it's because you have no clue about what to do, and you want me to Just do it.

And it seems this insanity - I can think of no better term for it, is going to continue before there is any hope that it will get better. Tomorrow we're having a meeting about the first hardware purchase to run a version that they've never deployed and don't even have finished. How can they possibly know? It's a guess, so let's guess and move on. Chances are, we're not going to underspend - this is a massive system and it's going to need more than we're throwing at the problem now. But they have to have these meetings with people that have no real clue about what they are doing. It's amazing.

Someday this will end... it'll end the day that they realize that this has to work, or it's a waste of millions of dollars. When that day comes, they have traditionally come to me and said 'Make it work!', and at that point, the meetings stop. Unfortunately, that's typically ridiculously close to the drop-dead date for implementing this thing, so I'm under a lot of pressure to get it working in a very short time.

But that's the day it'll end. I just wish I could sit all this out until that day. It's the kind of thing that gives me a headache - and reminds me why I don't sit in the manager/owner chair any more. It's just no fun.

Why Good Java Coders Need to Know C/C++

Friday, November 9th, 2007

cubeLifeView.gif

I was sitting here today working on the bulk push() and pop() for the BKit queues and a fellow developer stopped by to inform me that I wasn't using the best performing locking schemes in my implementations of the queues. While I accept that there are possibly ways to squeeze a few percent out of mine in certain conditions, these aren't custom-designed queues, they are general purpose utility classes that can't make assumptions about their use-cases in their implementations.

But that wasn't at all clear to this developer that stopped by my cube.

And that wasn't the only thing he didn't get.

Good Java coders are first and foremost good developers. Period. That means they understand how the code is executed in the machine. Where things might seem to be unimportant, or for that matter, important, and where they really are.

Case in point about high performance queues. If you're in a process where you have many threads processing data off a single queue, and the processing time is small, then you are guaranteed to have locking contention on the queue - by definition. There's no way around it. The solution I've seen time and again is to remove things from the queue in bulk and then the processing time goes up and the locking contention goes down because the number of threads hitting the queue at the same time goes way down.

But to this developer, the answer was Java 5's optimistic locking. Yes, folks, assume you'll get the lock and then fail and retry if you don't. Well, that's a wonderful idea in a situation that's already got locking contention problems. What was he thinking? The answer was, he simply wasn't. He was thinking that Java's way of doing things was so easy that under the covers it really didn't have to do any low-level locking for the push() or the pop().

So I had to walk him through it. It finally dawned on him that his statements were crud, and when he did, I could see him deflate right in front of my eyes. But for those 15 minutes when he was sure I was missing the point I wanted to yell at him Are you a total idiot? Don't you know any language other than Java? And folks, unfortunately, the answer to that is No. If you don't have any more experience in developing than Java, it's hard to see that it's not the end-all-be-all.

Adding the Bulk push()/pop() Methods to BKit Queues

Friday, November 9th, 2007

BKit.jpg

Today I was helping out a developer refactor his code so that it would be more CPU intensive for a shorter time as opposed to "tailing off" as it ran. The problem is this: his app needed to read a large amount of data (30,000 to 50,000 records) per file for several files. But not all files had the same length. This meant that his original design where h file processor was a thread, had the limitation that as some threads finished, 'less' of the machine was dedicated to the task. He wanted to fix that.

The easiest way to fix that is to have all processing go through a single queue - as opposed to a queue per file. This is not without it's limitations, however, as now this queue will become the bottleneck as synchronization on it to push() and pop() items will be under contention by all threads. The refactoring had a few issues, but the real problem remained - the queue operations needed to be done in bulk.

So, I added the ability to push a Collection of objects onto each queue (LIFO and FIFO), as well as popping off a bunch of values (up to a provided limit). This is the only way to achieve balance between the processing threads and the locking on the queue. The end result is that he now has another parameter to tune - the batch size of the pops. If it's 50 to 100 he should be in fine shape and not have to worry about performance hits due to excessive locking contention, but I'll let him fiddle with that and find the optimal value for his process.

The Compromises We Make – and Those We Don’t

Friday, November 9th, 2007

cubeLifeView.gif

Today I ran into a problem that's somewhat sensitive to me. Because I'm sitting in this cube, and could be doing a lot of other things, I realize that there are certain trade-offs I've made to be here. I'm not in a managerial role, but had I wanted to stay in one, teaching or owning my own company would have afforded me all the opportunity I wanted for that. No... I didn't want that. I like doing, not deciding for a living.

But every now and then there comes along a situation that makes me rather upset with the people I work for. I'm capable of doing the management, and as such, require very little management from above. My management appreciates this, and when it suits them, take full advantage of it.

Today I wanted to get licenses for Tibco for several projects we're working on. This is a replacement of either free or home-grown messaging systems, so there's no obvious feature that this will bring that we don't have now. Yes, it'll be soft factors - industry standard, higher performance, scalability, etc. but today - right now, there's no problem that this addresses. Yet it's a $100,000 expense. Too much for casual spending.

So rather than trust me that it's needed, and will be useful, they want me to make the drawings and slides to show how it'll fit into the next phase of our architecture. If we had run into a problem, they'd never want to see this... they'd want to know what fixes it. But because there's no immediate problem to solve, it's not as important as other things.

Understood... then let's just drop it. I'll wait until it's critical again and then I'll remind you that I mentioned this and you'll say "Well... why didn't you... Oooo!" and we'll do it then. But now, I have to do this for a group of people that either really don't care, or really don't need to care. It's infrastructure stuff, folks. Why spell it out to the web developers how the trade processing will change with a new messaging system? They might care, but when they need to know, they'll ask.

I guess it's the fact that if they thought it was too expensive, just say so and we'll limp along until it's critical and then we'll say "Here's the solution", and get it then. Doing it this way is just part of the reason that I don't own my own company any more - I don't like doing this stuff. And for the vast majority of the time, they don't want me doing this kind of thing - they want me fixing problems or enhancing the products I'm responsible for.

But I have to go through the whole dog and pony show, and let them take it to the Governing Council and then have project charts made, etc. It's turned from a simple little replacement project that'd take a few days to something that will drag out for months. Yucch!