Archive for February, 2012

Google Chrome dev 19.0.1049.3 is Out

Friday, February 24th, 2012

This morning I saw that Google Chrome dev 19.0.1049.3 was out, and with it, a few fixes, a new V8 javascript engine, and a few Mac GPU fixes that I'm not sure effect me, but it certainly can't hurt. I'm always glad to see the progress.

Numbers is an Amazing Tool

Thursday, February 23rd, 2012

Today I've been tracking the disk space used by my feed recorders with the mis-configured drive arrays. It's important that I don't blow out the drives before I move the software into production because this data is very important to the high-frequency trading operations, and we have to have at least one reliable copy of the data.

So I tracked the disk usage for the three boxes: prod, dev, and backup prod. The first, prod, was running the old code that had no issues, the other two were running the new code that had issues with the disk space. But, after the reconfiguration, it appears to have been solved. I then set up crontabs on each box to send me the output of df -k, and then I started tracking it.

I put it all in Numbers, and here you have it:

ArcSvr Disk Usage

It's just an amazing tool. Sure, you can do all this in Excel, but that's not with the Mac style and that's just as big a part of this as the data. It's just plain fun!

Somedays It’s Hard to be a Nice Guy

Thursday, February 23rd, 2012

cubeLifeView.gif

Today I ran into a problem that bothers me to no end because it's not something I've done, but it's all been perceived as my fault by the recipient. Imagine the case of the stereotyped person - pick any one you'd like. If, in their mind, what you are doing is because they are part of that group, then you are the "bad person" - no matter what you say or do. It's something I've seen over and over, talking to my friends in these "groups". Many of my black friends say the most racist people they know are black. It's oddly bizarro logic. I appear to be a snob, if they "see" everything through their own filters.

So today I've had this happen to me, and I've tried to explain that it's not the case. It didn't take. So I think I'm going to take a different tac - let it go. If they are really hell-bent on portraying me as a racist, or an intellectual snob, or whatever, then that's up to them. I'm sure they can take any words or gestures I've done and twist them into supporting that decision - no matter how insane that "mapping logic" is.

So if there's nothing I can do, then the best I can do is to just stay away from them. I have no desire to work around these people. They aren't my friends - they are my co-workers. Period. If it is nice to come to work - great, but I've worked at places that aren't and survived, even thrived, and I can do it again.

I just get tired of trying to be nice and having it used against me time and again. They will think what they want, and there's nothing I can do about it. So be it.

Fantastic Side-Effect of Auto-Flipping Feeds

Thursday, February 23rd, 2012

High-Tech Greek Engine

Today I was running my Greek Engine, and with the fixes I'd made yesterday for the auto-flipping, I was totally surprised to see the feeds oscillating! Today, we have both A and B sides, so for the first time since the auto-flipping code worked, there was a situation where it shouldn't flip, but it was. It took me about a half second to realize what was happening, and to start laughing. I was being too efficient on the flipping, and when I'd drain a side, I'd quickly see that the other side had something to process, and I'd flip right over.

Then the other side would get ahead, because the non-preferred side is doing less work, and we'd drain the preferred side and flip again. It was an oscillation that was just no good. So I started tuning the code. The initial flipping thresholds were just too easy to hit. So I started by increasing the time a side must be empty. I did several tests, and finally settled on a decent value that has the side clear for about 15 seconds. If a side hasn't received a message in 15 sec, and there's other messages to process, then flip. That seemed reasonable.

I also realized that comparing the combined datagram queue and pending decoded message queues of the non-preferred side to the decoded messages of the preferred wasn't really fair. So in order to do an apples-to-apples comparison, I changed the size comparison to be the size of the pending decoded messages and I think that's going to help a little, but not much.

In the end, I had something that no longer oscillated - but it did do something quite unexpected: It "found" the faster side! It took me a minute to realize what I was seeing, but then it was obvious: if one side can really deliver messages faster than the other, and we're getting consistently empty on the preferred side, but messages on the non-preferred, then we'd be a lot better off flipping, and using the faster side.

Now this speed can come from a lot of things, but most likely it's just the electrical path, or the bandwidth of routes and switches in the path, but one thing is certain, there are very few times that the two sides are identically equal. What this change has done is to make sure that when there's a significant difference, we'll find it, and use it.

It's stuff like this that makes me glad I'm doing this kind of work. Finding unexpected benefits of a code change is really great. Too often it's the other way, but today I came out a winner. Excellent!

More Drive Array Problems

Wednesday, February 22nd, 2012

bug.gif

Today my UDP feed recorders blew out a 3TB+ disk array, and when the admins unmounted and remounted it, I had 2.4TB free. Something was going on. I had changed the file writing from a buffer and write once style to a append incremental updates scheme, and all of a sudden things blew up. So of course, I think it's me. So I decided to check.

The first thing was to get a simple test app, and then run it on the drive array and not, and compare the results. Thankfully, there were drives on my troubled box that weren't the drive array - my home directory, for one. So I just needed a test app, run it in my home directory, then on the drive array and compare the results.

My little test app was simple:

  #include <iostream>
  #include <fstream>
  #include <stdio.h>
  #include <stdint.h>
 
  int main() {
    std::string  name("local.bin");
    std::string  buffer("Now is the time for all good men to "
                        "come to the aid of their party\n");
 
    for (uint16_t i = 0; i < 10000; ++i) {
      std::stream  file(name.c_str(), (std::ios::out |
                                       std::ios::binary |
                                       std::ios::app));
      file << buffer;
      file.close();
    }
 
    return 0;
  }

and then I compiled it and ran it. on my home directory I got:

  $ ls -lsa
  656 -rw-r--r--  1 rbeaty UnixUsers 670000 Feb 22 16:44 local.bin

and when I ran it on my suspect drive array I got:

  $ ls -lsa
  262144 -rw-r--r--  1 rbeaty UnixUsers 670000 Feb 22 16:44 local.bin

So it's clear that the byte counts are right - 670000 in both cases, but the blocks used are reasonable on my home directory drive, but the drive array is totally wigged out. This explains the problem I've been seeing - when I append to a file the drive array gets confused and adds all kinds of blocks to the file, but doesn't corrupt the byte count. Very odd.

So I sent this to the admins and let them use this as a test case for trying to fix this. I'm sure hoping they can do something to fix this guy. I need to have this running as soon as possible.

UPDATE: that's 256k blocks - exactly. This is interesting. That means it's not accidental. There's something about the driver that's putting 256k blocks for the binary append, and doing this over and over again. Interesting, but it's just all the more evidence that this is a drive array bug.

[2/23] UPDATE: turns out to be an XFS option: allocsize=262144k, and that was easily fixed by the admins. I'm guessing the file system on the home directory filesystem wasn't XFS, or had a better default allocation size. But it's fixed. Good.

Wonderful Unix Number Counting Command

Wednesday, February 22nd, 2012

Ubuntu Tux

Today I was having a really bad day with my UDP feed recorders. They filled up a 3TB+ drive array and I could not figure out why. As I dug into this, I starting seeing a pattern that was really bad: the files in a directory didn't add up to the output of du. So I wanted to test the theory.

The trick was, I wanted to just run a simple command - not write a script or a program. Just a command, but it wasn't clear what to do. So I hit google, and there was the simple result:

  ls -lsa | awk '{ sum += $6 }END{ print sum }'

The ls is obvious - the 6th column is the number of bytes, but the awk line is the real beauty here. I've used awk a lot before, but I didn't know it had dynamic variables like this. And then to have the END tag to put it at the end of the command. Simply brilliant.

This is why I love linux/unix. You don't have to write groovy scripts if you understand the system. Love it!

Adding NBBO Exclusion Rules to Ticker Plants

Wednesday, February 22nd, 2012

High-Tech Greek Engine

Today I spent most of the day working into my NBBOEngine the concept that the exclusion rules for exchanges wasn't limited to the global exclusion of a single exchange - but that it might be targeted at a single stock, or it's options, or the entire family. These requests came in from the operations group, and they said they needed these rules before they could go live with the Greek Engine. As you might recall, the engine uses the embedded ticker plants, so that's the connection.

So I needed to come up with a way to easily allow the global defaults as well as instrument-level overrides to those defaults, and scope them to include just the instrument, it's options, or both. In general, it wasn't horribly hard - a boost::unordered_map with a std::string key of the SecurityKey, and then a simple object that would hold the scope (a uint16_t bit-masked word), and the array of bool values for the individual exchanges.

Then I needed to replicate the method calls - adding the SecurityKey and scope, and then work that into the framework, and then work that into the external API. Nothing terribly complex, but it's a lot of little pieces, and more than a little typing. In the end, it's all working pretty nicely, and the additional load on the NBBOEngine is zero. Actually, I improved a few things, and that offset the additional lookup of the map.

I then did a little fixing up of the code for visualizing these exclusions, so that it's clear what's being excluded - at the global or instrument levels, and this makes the update really complete. Lots of little things, but in the end a far better system for managing bad data from the exchanges.

Factored Out Some Magic Numbers

Wednesday, February 22nd, 2012

GeneralDev.jpg

One of the things I really hate about "sloppy" code is the use of Magic Numbers in the code. Things that look like this:

  if (mAutoFlipSize < 400000) {
    if (((++emptyTrips > 75) && (sz > 50)) ||
        (sz > mAutoFlipSize)) {
      drainAllPendingMessages();
      flipSide();
      continue;
    }
  }

Unfortunately, this is all my code. I have no one to blame but myself. I started tuning this code, and needed to play with the buffer sizes and limits, and this is what I was left with.

But having it, and leaving it are two entirely different things. I spent a few minutes today to remove all these magic numbers and use a simple grouped enum to make them far more manageable:

  namespace msg {
  namespace kit {
  namespace udp {
  enum tConst {
    eAutoFlipEmptyTripSize = 50,
    eAutoFlipEmptyTrips = 75,
    eAutoFlipDefault = 50000,
    eAutoFlipManual = 400001,
  };
  }     // end of namespace udp
  }     // end of namespace kit
  }     // end of namespace msg
 
 
  if (mAutoFlipSize < udp::eAutoFlipManual) {
    if (((++emptyTrips > udp::eAutoFlipEmptyTrips) &&
         (sz > udp::eAutoFlipEmptyTripSize)) ||
        (sz > mAutoFlipSize)) {
      drainAllPendingMessages();
      flipSide();
      continue;
    }
  }

Much better! I now know that I can keep the constants in sync with the code. No more Magic Numbers.

Fixed Weekend and Holiday STALE Flag

Tuesday, February 21st, 2012

bug.gif

I got a note from one of the guys in another group about trying to hit one of my servers on the weekend, and not getting what he thought he should get. Instead of the hundreds of thousands of instruments, he was getting just a few hundred - clearly not right. But where was the problem?

The problem with finding this guy was that the bug report was sketchy at best. Made worse by the fact that the guy that reproduced it in my group failed to tell me any of the details about what he had found. Consequently, I spent quite a while trying to track down possible changes in the commit logs as opposed to looking at the code - where the problem lay.

Finally, I was able to extract this information from him, and was able to see that the STALE flag - a flag we use int he system to indicate that there have been no quotes or trades on an instrument today, was improperly showing as 'true' on the weekends. While this makes perfect sense, it's got the unintended consequence of filtering out all the STALE instruments from the output to the client.

What I needed was to change the logic for the STALE to allow for the fact that if it's not a trading day, then any update (quote, print, summary) was OK, and we're not stale. On a trading day, we use midnight of the same day. It's pretty simple logic, but it's going to make a huge difference in how this code acts on the weekends and holidays.

Glad I was able to get the information about the problem. It was pretty easy after that.

Fixed Auto-Flipping on Exchange Feeds

Tuesday, February 21st, 2012

bug.gif

This morning we had an unusual situation with the exchange feeds due to some downed lines from one of our providers. Let's forget for a minute that I used to do this job with the telcos, and I know exactly how they respond, and to think that this is down for a few hours - let alone a day is something that I almost laugh at. OK… I really laughed at this.

But like I said, let's forget about these facts…

No, today was a unique opportunity for me to test my auto-flipping logic on the exchange feeds because for some feeds we lost the A side, and others we lost the B side. So I should expect to see groups of feeds on A and others on B. What I saw was that nothing flipped, and so I dug into why.

Well… it turns out there were a few mistakes on my part. I had originally been using:

  bool UDPExchangeFeed::drainAllPendingMessages()
  {
    bool                 error = false;
    msg::DecodedMessage  *pkg = NULL;
    while (mPackages.peek(pkg)) {
      if (pkg == NULL) {
        mPackages.pop(pkg);
        continue;
      }
      deliverMessages(*pkg);
    }
    return !error;
  }

the idea being that if I ran into a NULL in the queue, I'd skip it. Otherwise, I'd deliver the messages in the package and continue. Hold on a sec… there's my first mistake. I'm never popping off the messages!

Yes, friends, I had an infinite loop, and that was what was stopping my flipping from happening. I needed to have something like this:

  bool UDPExchangeFeed::drainAllPendingMessages()
  {
    bool                 error = false;
    msg::DecodedMessage  *pkg = NULL;
    while (mPackages.peek(pkg)) {
      if (pkg != NULL) {
        deliverMessages(*pkg);
      }
      mPackages.pop(pkg);
    }
    return !error;
  }

where it's clear that only in the case of non-NULL peek, did I do something, but I always popped off that top element to continue.

The next problem I found wasn't so much a logic issue as a use-case issue. The trigger that I was using for knowing when to flip sides was the size of the incoming datagram queue. The problem with this is that if the decoders are working, that queue is almost always going to be very small. It's the decoded packages queue that was also in play. So let's add them and use that as the trigger. Looking much better now.

The final issue was really one of size. What happens when I have a trip level of 50,000 messages, and I have a feed that doesn't produce that in 5 mins? I get stale data. That's no good. What I need to do is to detect when there's a long period of inactivity in the preferred side, and there's something on the other side to use. In order to figure this out, I put a little counter on the loop to count up not many "preferred side is empty - wait", passes I'd had. If it was enough, say 75, then if there's something on the other side - even if it's not 50,000 messages, flip over because this side isn't producing anything now.

With this, I get the behavior I was originally looking for. We flip when we have data and it doesn't take a long time to do it. I don't miss a lot, and we have a nicely self-adjusting system. Good news that this came up today.