Archive for the ‘Coding’ Category

Where Relational Databases Just Fall a Bit Short

Wednesday, April 28th, 2010

database.jpg

Today I've run into a problem where I'm really not at all pleased with the solution I have. It works, but that's a far cry from what I want to see. It's best described as brute force, and that, while sometimes necessary, is almost never the words you'd use to describe an elegant solution. So here's the set-up:

I have volatility data for a series of instruments - OK, let's simplify it and make them options. We have the components of the option: underlying, expiration, strike, and then the "meat" of the data - the volatility. If I put this into a SQL table, it looks a lot like this:

  CREATE TABLE Volatility (
    acquired        datetime     NOT NULL,
    product         VARCHAR(42)  NOT NULL,
    expiration      datetime     NOT NULL,
    strike          FLOAT        NOT NULL,
    volatility      DOUBLE,
    generationTime  datetime,
    CONSTRAINT volatilityPK PRIMARY KEY (acquired, product, expiration, strike)
  )

where I added in the acquired field to hold when I received the data, and the generationTime to be the time the source sends me as the instant the data was generated there. It's pretty standard.

But I ran into a wrinkle. There were multiple sources sending me data for the same options so I was getting a bunch of primary key violations. That's not good, so the solution is to make the key unique by adding the 'source' as identified by the portfolio it's coming from. So now the table looks like this:

  CREATE TABLE Volatility (
    acquired        datetime     NOT NULL,
    portfolio       VARCHAR(128) NOT NULL,
    product         VARCHAR(42)  NOT NULL,
    expiration      datetime     NOT NULL,
    strike          FLOAT        NOT NULL,
    volatility      DOUBLE,
    generationTime  datetime,
    CONSTRAINT volatilityPK PRIMARY KEY (acquired, product, expiration, strike)
  )

But now we're getting to see the real problem: size. Let's look at a single record. In order to hold the "meat" of the table - the volatility, we need to have a primary key that's:

  • acquired - 8 bytes
  • portfolio - average of 8 bytes
  • product - average of 6 bytes
  • expiration - 8 bytes
  • strike - 4 bytes

for a total of approximately 34 bytes to hold an 8-byte volatility value. That's a ton of overhead.

The problem only gets worse when you look at the enormity of the data. I've got roughly 120,000 instruments and if I'm sampling them every 10 sec, that's 6 times a minute, 360 times an hour, or upwards of 3600 times a day. That's 432,000,000 rows a day or about 18,144,000,000 bytes for the data, of which a whopping 14,688,000,000 bytes is the key. That's almost 17GB of data a day - just to hold the volatility data.

If we held it in a 'map of maps' data structure, the only thing we'd need is the data that's actually changing: the acquired and volatility. That could represent a huge savings in memory. But the cost is that I can't "mirror" this data to a back-end SQL database, and can't use relational algebra to get data out.

I've thought of trying to duplicate this in SQL by having two tables:

  CREATE TABLE Instrument (
    instrumentID    IDENTITY     NOT NULL,
    portfolio       VARCHAR(128) NOT NULL,
    product         VARCHAR(42)  NOT NULL,
    expiration      datetime     NOT NULL,
    strike          FLOAT        NOT NULL,
    CONSTRAINT instrumentPK PRIMARY KEY (insrtumentID)
  )
 
  CREATE TABLE Volatility (
    acquired        datetime     NOT NULL,
    instrumentID    INTEGER      NOT NULL,
    volatility      DOUBLE,
    generationTime  datetime,
    CONSTRAINT volatilityPK PRIMARY KEY (acquired, product, expiration, strike)
  )

where I have a table of instruments, and those all have unique IDs, and then I'm using that ID, a simple integer, to represent the corresponding fields in the old Volatility table. It's all possible, and it's closer to the "normalized database" you'd expect to see, but the problem comes in when you look at the other factor: access speed.

If I'm generating 432 million rows a day, then it's not going to be too long before it's virtually impossible to do a join on that table. Even a month of data, and we're talking about holding years, would generate nearly 9 billion rows in that table.

Nah... there's no way we can do a SQL JOIN on that guy. We have to use the de-normalized data, and then just live with it. Crud. Not great, but it's the best I can do right now.

BBEdit 9.5 is Out

Tuesday, April 27th, 2010

As if Transmit 4 wasn't enough for one day, the Bare Bones crew released BBEdit 9.5 today with an impressive list of fixes and new features. What I like the most, however, is the "in-line" Find box that you can activate just like in Safari:

BKIRCProtocol.java

I've been wanting to have this feature in BBEdit for a long time. It's one of the most useful and clever additions to an editor I've seen. Up to now, it was in SubEthaEdit, which is nice, but still not to the point that it supplants BBEdit as my main code writing editor on the Mac.

There are a ton more things to explore in BBEdit 9.5, and I've only had it running for a few minutes, but it's just an incredible day for new software on the Mac. Amazing.

Quietly Banging Away… Trying to Stay Hidden

Monday, April 26th, 2010

cubeLifeView.gif

Today was a day like a lot of the recent days have been: quiet... almost too quiet in the group. I think it has a lot to do with the "incident" as I've come to call it, and while I'm probably only partially right, my recent excursions at the end of the day and even the middle of the day, along with the number of closed-door meetings I've had with senior partners, probably make it pretty easy to put two and two together to see what's going on. It's not ideal by any means, but that issue is out there, what I've done in response is out there too, and we just have to deal with it.

I'm not really happy about the situation, but I'm not going to sit here and continue to deal with the kind of management issues I've had to deal with when there's a viable alternative out there. That's just silly. I'm not sure what's going to come of this, but it's going to change things, that's for sure.

So in the meantime, I'm doing my best to just do a bang-up job of getting new things done and old things fixed up. Heads-down coding. I get the obvious benefit that it's clear I'm still working for the good of the organization, while gaining the added benefit of staying in my little corner of the pod and not getting involved in too many conversations.

One thing I did get a little involved in was after I came in and caught up on my email, I vented a bit to my teammates about the things that transpired in my absence (on Friday). Basically, two things that I really, really wished hadn't happened:

  • I got called about memory additions to some machines - Why? they can do this. I'm not needed. I don't need to be called, emailed, texted just to be told they are adding memory to some boxes. It was my day off - which, it seems is simply something that doesn't exist for me in this place. I just needed to be home to help out Liza, and I got electronic "buckshot". Enough already.
  • Steve, a co-worker, pointed out jQuery to Ralph - this was the equivalent of giving Guy Fawkes C4 and remote-controlled detonators. It's going to run into a flurry of little, cosmetic, GUI changes to the web pages when we'd already looked at it, and decided that for the first round of this web system, it wasn't worth the effort. Clearly, we left Steve off the email list. My mistake.

Don't get me wrong, a JavaScript library like jQuery is nice, but there are a lot of libraries - some more focused on the UI components, some more all-encompassing like jQuery. Rather than just pick one, I wanted to have a really good idea about where we were headed with this web site and then look at what we needed help with for the second cut of the pages. Yeah, it's a lot of work, but not having put together a complete AJAX web app before, I had no idea what we needed, and didn't want to pick wrong.

So I had to say "Please don't do that. There's a reason we haven't, and now the monster is loose." Sure, it didn't do a bit of good, but I had to say it so that I can at least out a little bit of a lid on the situation. We have so many things that just don't work, or need to get working, that messing with pretty pop-ups and fades just doesn't really rise to the top of the TODO list.

Don't think it made any difference, but I had to voice my concerns on these issues because of all that's been building up in this cemetery of a pod - post-incident. It's just a big, long sigh...

Decided to Pick up Books on Groovy and Clojure

Friday, April 23rd, 2010

Given the way things have been heading at The Shop lately, I thought it would be a good idea to pick up a book on Groovy and another on Clojure. I didn't know nearly enough about them to be efficient and proficient with them, and I have a feeling that if things continue to develop as they have, I'll be needing them sooner, rather than later.

I have heard a little about Clojure, and it's an interesting take on solving the concurrency problem with software. I can see why it's popular, and it'll be interesting to see it in action, but Groovy was a complete mystery to me.

After the first few pages, I can see it's a Ruby-style of language with extensions to any and all objects. Looks interesting. I haven't done a lot with Ruby, and as Groovy is Java-based, it should work well into what I do. We'll have to see.

Anyway... picking up new tools is always fun.

Upgraded to Git 1.7.0.3 on MacBook Pro

Thursday, April 22nd, 2010

gitLogo.gif

I was reading a few tweets from GitHub today and saw that they had a few new features that need Git 1.6.6+, and it got me to wondering: What version am I on, and what's the latest? I thought I was on 1.6.5.something, but I wasn't sure. So I decided to check. Simple enough.

Because I had tried the "build it yourself" for Mac OS X 10.3.9, I knew that with 10.5, I'd used the git-osx-installer that's hosted at Google Code. I looked there and saw that the 'current' version was 1.7.0.3, and I did a simple:

  $ git --version
  git version 1.6.5.2

OK... we have some work to do.

I got the package from Google Code, and then simply installed it. I'd done it before, so I didn't have to worry about the MANPATH and other things, I just needed to update it.

Now I get:

  $ git --version
  git version 1.7.0.3

and I can take advantage of the new features on GitHub.

I just need to remember to update my machines at home as well. Gotta do that tonight.

UPDATE: Interesting... according to the Git website, the "current" version is 1.7.0.5. That's OK, I'm a lot more current with 1.7.0.3 than 1.6.5.2.

Getting Spun Up Again

Tuesday, April 20th, 2010

Today the pod is still as quiet as a cemetery - primarily, I believe, because of the "issue" that occurred yesterday between my manager, Ralph, and myself, that I wrote about yesterday. Looking back at it, there was bound to come a time when we were going to have a clash like this. If he's convinced that he can code better than I can (and that was what this was really all about), then it was going to come up sooner or later. I'm glad it's out there, but it does tend to put a damper on the group dynamic for a while.

This morning I'm trying to get spun up and excited about being here, for as long as I'm here - or maybe as long as Ralph's my manager here, I'm open to all kinds of ideas now. Anyway, I'm doing some of the work Ralph asked me to do this month, and it was actually a little fun, which is great, as that's just what I needed.

Then I started updating the release notes for my web app and things dipped again.

One of the problems voiced by me as well as Ralph's managers is that he's asking me to build things that no one asks for. He's really doing speculative development, which isn't bad if it's small things like workflow changes, or helpful features, but when it's wholesale new data sets and features, it can cause problems. Such was the case this morning.

I have written a new component to the web app, and it's based on data that we can't possibly hold right now. We've had a database on order for months, memory on order for weeks, and it's taking a long time to get these things in house and installed. While I don't resent any of that, I do resent the idea that Ralph expects me to write the new features on hardware we don't have.

It's not really a compliment, it's an annoyance. It's arrogance on his part, because I'm telling him not to push, but he's saying "do it anyway". The result is, of course, that I'm doing the best I can, but on very limited data sets, so that when he looks at my work, it's all about "now switch it to this one... now this one..." in order to get the value without having to do his job of tracking down the acquisition of resources.

But I try to let that pass because, in the end, he's made my life easier for having made it clear I can't work with him long-term.

But when it rains, it pours.

His next brilliant idea is marketing.

He believes that we need to work to get more folks in The Shop using the application we have developed. Now technically, he agrees that the only customers we have are very satisfied with the work we've done, but that's not good enough. Still, if someone outside our customer group asks for something, I do my best to provide it to them. Again, that's not enough. His next goal is to get user who don't need our application to use it.

He wants is to market the application to internal users.

Now if this mattered to our group's existence, I'd say "OK, let's get out the mass-mailers", but it doesn't. In fact, Ralph has already been accused of doing too much with the platform by his managers. We have already gotten rave reviews by everyone, but still that's not enough.

I'm convinced it's about visibility and recognition. I can see that Ralph can't get enough of either, but I have no idea what happened in his past to make him this way. What I can say is that I'm all for new users using my stuff. I think that's great. But I've done marketing. I'll do some, but that's not what I'm going to want to do for several hours every week. I'd rather be talking to our real users and seeing if I can help them.

It's hard to get excited about your job when you're in a situation like this.

The Deciding Factor

Monday, April 19th, 2010

There are a lot of times in my life when I'm really on the fence about something. I want to get the new 17" Unibody MacBook Pro with the quad cores and the 512GB SSD, but we're trying to sell our house and move to Naperville, and I think maybe I should hold off until we're moved, and see how things look then. On the fence.

I've been on the fence for the last several months about what to do about my position in The Shop. I've entertained the idea of leaving, and I've talked to the management about my concerns that might move me to make that decision. I've asked for relief - in the form of working at home for a while, and that was rejected as I was considered too vital to the team to allow the remote work option. Again, on the fence.

There's a lot to like about The Shop. There are plenty of people I enjoy in a lot of different spots. There's a new pseudo-CTO that has promised a "cut 10%" policy that promises to get rid of the deadwood that's a significant problem here. He's also understanding of the fact that there are serious problems in such fundamental things as the market data, and is working to address them. There's a lot to like, and a reason to think it's going to steadily get better.

But then there's the one really significant downside: I work directly for Ralph.

On a normal day, Ralph is a micromanaging, untrusting, manager that believes he's always the smartest one in the room. And while I agree with these words, they were first spoken to me by Ralph's managers. It's not a good situation, but when Ralph is busy in meetings, or quietly working on something else, it's easy to forget he's sitting six feet to my right. But when he chooses to assert his management over me, it can get very uncomfortable.

I was raised with a serious work ethic, and respect for those over you. Not that I would hold doors for Ralph if I saw him out on the street - but in the workplace, he's the manager, and the organization has made that decision, and deserves my obedience to that decision. If I don't like it, I am always free to pick up and leave. That's my choice.

So I'm again on the fence. If Ralph is quiet, it's not too bad to do my job and get some decent measure of satisfaction, and go home at night feeling that I offered a great service to the company for the salary they paid me. Even when Ralph is asking me to check on a few things, or add a few things, it's something I can handle. But when he goes off on one of his micromanaging binges, it's hard to keep my mouth shut.

Such was the case today.

Today, he was trying to find out the reason some group's end-of-day P/L wasn't matching what we had, and wanted to have me make some changes to the system we have that generates our numbers. I tried to explain that the difference between what we had and what he wanted was a design decision - specifically, there were a few "prices" we could use for a future, and the difference was between two, and we were using one, and the change would move to the other.

The problem would be that this would impact a lot of people, and the reason that the one was chosen in the first place (by the original developer) is now totally lost because he doesn't remember, and didn't comment why he did it. I was trying to explain that I didn't know why the system was the way it was, and Ralph get pretty testy.

"Can you send me the code?" he asked. 'Sure', I thought... with absolutely no coding experience in Java (limited to Matlab and Excel), he's got no chance to actually figure this out. I'll send him the files out of the SVN repository - even sending him different revisions of the file to show what I meant: the change was made, but the "why" was missing.

After a few minutes, I hear over my shoulder: "Bob, what's the code doing here?" I turn to see that he's got the latest code up and focusing on the location I highlighted in the email with the links to same. I got a little frustrated, but said nothing. I rolled over, and I'm guessing it showed on my face, or in my tone, but after about 30 seconds of me explaining the code to someone that doesn't know an object from a chair, he said rather loudly: "Hey! I need you to clam down!"

"Ralph" I said, "I'm frustrated because I feel that you don't trust me."

"Do you know the answer to the problem?" He asked me. "No? Then I think I might, so I wanted to look at the code."

Now this is a guy that's got no idea of what Java is. He doesn't understand references, objects, links, he's well intentioned, but clueless, in this regard. I try to explain a few things to him and he's not getting it. I have to explain several times over. It's getting out of hand.

In the end, he tells me to remove several lines of code (I suggested we not do that), and even get rid of error messages that he says are not errors at all. I realize that Ralph has gone off the deep end, and it's not worth having any further conversations with him. I agree to do what he's asked, do it and am done with it.

After about 15 mins, I realized this was a blessing in disguise. Ralph got me off the fence. He made it a very black and white decision for me: stay here and work with him, or go to a place that's actually nice to work.

No question.

When I made that decision, I was actually happy that the blow-up happened. Yeah, it was noisy, and yeah, it was out there in front of the entire group, but in the end, he made it possible for me to make a solid decision.

Thanks, Ralph. You made it easy on me.

DataGraph 2.2 is Out

Monday, April 19th, 2010

This morning I saw that DataGraph 2.2 was out, and while I haven't been pushing that too much, it's nice to see the significant improvements that he's been making in the application and framework as time goes on. It's also very interesting to me that a significant portion of his libraries are in C++. Interesting.

Anyway, great to see the improvements.

A New First: Being Asked to Not Use Boolean Algebra

Wednesday, April 14th, 2010

Crazy Lemon the Coder

Today I had the very uncomfortable first of a co-worker coming up to me and telling me that they couldn't understand the code I'd written. It contained boolean algebra, and while it's not as easy as '1 + 1', it's not a lot harder to anyone that's been actually trained in the field of programming.

The code in question was using bit-mapped flags like the following:

  public static long ERROR_404          = (1 << 0);
  public static long ERROR_408          = (1 << 1);
  public static long ERROR_500          = (1 << 2);
  public static long CONNECTION_REFUSED = (1 << 3);
 
  // set up which errors you don't want to log
  long  ignoreErrs = ERROR_404 | CONNECTION_REFUSED;
 
  try {
    // ...snip...
  } catch (FileNotFoundException fnfe) {
    if ((ignoreErrs & ERROR_404) == 0) {
      log.error("Got an Error 404 (FileNotFoundException)!");
    }
  }

where it's clear that we don't need a full integer for each of the errors I wish to not log. Just a bit. Which makes perfect sense - so long as the number of errors you're trapping for are less than the bits in a Java long datatype. But in our case, it was jsut fine - I think we had a total of five errors to look for.

So my co-worker was clearly flustered about the code that allowed for the selective logging of each error type. They didn't understand that the code snippet:

  if ((bunchOfFlags & aMask) != 0) {
    // ...blah...
  }

is just saying: "If the aMask bit is set in the bunchOfFlags". It's a simple true/false condition, that (unfortunately) Java won't allow to be written:

  (bunchOfFlags & aMask)

because it says it can't compare a long to a boolean. Ugh... OK, fine. I'll put in the == 0 and Java will be happy. Still... I'll admit it's a little trickier when you're working with "negative logic" like what errors not to log, but really... is it all that hard?

Clearly, the answer is Yes for this person.

So I spent some time today, and I'm sure I'll need to finish it up in the morning because the big motivation for this level of control over the logging is my belief that errors in the log of any application should be a call to action and not "noise". Unfortunately, this isn't universally agreed in the group, and we have people that want to see errors that they have no intention of putting a halt to, where as I want that particular noise eliminated.

But being asked to remove boolean algebra from my code... that's a first. Amazing.

Debating the Value of Writing Good Code – Amazing

Wednesday, April 14th, 2010

I just got done talking to my two teammates about the value of defensive coding. Specifically, that there is value in Java of checking the value returned from every method call - including the JVM's new operator. I'm stunned that these guys believe that it's "silly" to check for things like that. I say, that's the bloody point of checking things in code. Have they not read Code Complete? Have they never really built large, critical systems that simply didn't have the option of failing?

My guess is, the answer is No.

In my code, you'll see a lot of code that looks like this:

  /**
   * This method returns the reference to the BKIRCProtocol that is
   * going to be monitoring the traffic from the IRC Server while we are
   * connected to it. If there is no listener defined at the time this is
   * first called, then create one because we really need to have it.
   */
  public synchronized BKIRCProtocolListener getListener() throws BKException {
    if (_listener == null) {
      _listener = new BKIRCProtocolListener(this);
      if (_listener == null) {
        throw new BKDebugException("BKIRCProtocol.getListener() - no protocol
          listener was present and one could not be created. This is a serious
          Java allocation error and needs to be looked into.");
      }
    }
 
    return _listener;
  }

Where I follow a new with a test for the resultant against null. It's standard coding style for me. When I was at First Chicago, I had the great fortune to work with an incredible group of guys. They were pivotal in having me look at coding as something a group can do as opposed to a series of individuals. In that group, I learned the value of having the code look like one mind wrote it. Debugging and fixing was as easy for one as it was for all. The style wasn't exactly mine, but we agreed on one for the group, and it stuck. It was great.

I have been trying to re-create that kind of experience and to some extent, I've been able to succeed in very limited cases. But I have seen it again. When you look at coding as not your work, but the team's work, you start to see that there's more to it than what you think. You care about the next guy getting the call to fix a problem you might have created. But when you write as a team, it's impossible to tell who wrote it, and that makes debugging vastly easier.

I guess I'm just surprised that I'm talking with developers making six-figures and they are talking about "how often" in the error checking. It's just stunning. You only have to write it once, and then it's always going to be there. That's the "how often" you need to focus on.

Expect more. Build it better than it has to be.