Archive for the ‘Coding’ Category

Sampling Intervals and the Dangers of Common Sense

Friday, January 20th, 2012

bug.gif

This afternoon I've spent quite a bit of time working on a few issues that popped up in today's testing of the greek engine. One was a bad copy constructor that was leading to bad calculated values, and another was the calculation of the high and low for a composite instrument. The guy doing QA looked at the formula for the composite:

  Value = (Comp1 * 0.700) + (Comp1 * 0.400) + 0.55

and thought that the high/low on the day should be the simple application of this formula to the individual high/low values for the components. But that is not the case. The computed value of this function is a time-based function and unless the components are perfectly correlated, the high of one will not coincide with the high of the other. This means that the computed high will be the largest value of the computed quantity, and will most likely be less than the "expected" high value.

But it gets more interesting…

Because of the speed of ticks, we accurately track all trades for the determination of the high and low. However, downstream, we conflate the messages, so that when the calculation is actually performed, it may only be once every 100 trades (depending on volume). This means that the formula is not accurately calculating every value of the function - only those that it needs, and that, too, will effect the high/low as shown.

All this is just too confusing for the traders to cope with. So I used the equation:

  Value(high) = (Comp1(high) * 0.700) + (Comp1(high) * 0.400) + 0.55

and decided it was better to have a clear, explainable value, than risk the confusion and effort of having to explain it to all the traders several times.

Pricing Systems are Details, Details, Details (cont.)

Friday, January 20th, 2012

bug.gif

OK, I just finished the latest fixes for today, for testing Monday. I know it's the way it goes, but it's still tough. I'd really like to know it's fixed, but the best I can do it run it through in my head and be sure that, at least there, it's right. I then have to hope that my mental picture of the system is accurate. Sometimes yes, sometimes no.

Today's issue was really pretty significant. It appeared as though the previous close and adjusted previous close weren't being loaded. But I'd tested that code, and I knew in my tests it was working. Something was amiss.

So I resorted to logging, and in that I realized that the problem was systemic. I was indeed loading them properly, but then the first message was clearing out the summary data, that contained these values. Therefore, it appeared that the values weren't set when they had been set. So what was the problem? I was resetting all parts of the summary data when all I wanted to clear were the open/close/high/low. Ah! Simple fix.

But what about the reloading of an instrument where we replayed the messages from the previous day and the last trade last night was then setting our new high/low/close? Hmmm… this is just as obvious, but it's a lot less clear how to fix it. We can't block the messages, they are needed. Also, we can't stop the resetting process, that's crucial.

What I came up with was a very surgical change: if the trade was yesterday (i.e. not today), then do not allow it up update the summary values of open/high/low/close. Simple and surgical. This is the complete context of the problem. I'm hoping that there are no unintended consequences on Monday, but we'll have to wait until then to see. I've run it through my head so many times I can't see a hole in the logic.

I have my fingers crossed.

Updated the MFDS Codec for Ticker Plants

Friday, January 20th, 2012

MarketData.jpg

This morning I needed to finish up just two new messages in the NASDAQ MFDS data feed - the two Corporate Action messages. I had built the MFDS codec for the Ticker Plants more than a year ago, but then sometime in the middle of 2011 they changed the feed and added six new messages and dropped four, I think. In any case, it was a big change, and I spent several hours with the docs and my code to put in the new messages, and chop out the old.

In the end, it was nice to know that it took less than a day to get the new messages in, and decoding happily. Getting mutual funds will be nice, too. Just don't have a need for them right this second.

Pricing Systems are Details, Details, Details

Thursday, January 19th, 2012

High-Tech Greek Engine

I've been here before, and yet it doesn't make this process any easier. When you get a new pricing/calculation system into users hands, the thing you spend a ton of time on is the testing of the prices. Open prices. Closing prices. Volumes. All are exceptionally important to traders, and when they have an existing system to compare to, it's even harder.

So it comes as no surprise to me that now that we're in that phase of my greek engine, that's what I'm doing. The really annoying part is that I can only test most things once a day. Opening prices? That's only one time. Closing prices? Again, once. It's hard to say "OK, I'm on this, have a change and you'll see it tomorrow". It can't be avoided, but it's annoying to me nonetheless.

Clearly, I'm an impatient person - I write code for a living. If I had patience, I'd be back in VLSI design. There, you test something a lot in simulation, and then build it once a month. No, that won't do at all.

So I'm slugging out changes with the open/close/high/low right now, and it's a big puzzle. Most of the code is working perfectly, but some edge cases are causing problems. You don't want to change too much, but you have to make a nice surgical cut and put in the code that will have the effect you need. It's a lot of thinking and going over possibilities in your head, and then writing two lines of code.

Slow going, but if it's progress, at least it's one less thing to hassle with.

Fixing Tricky JSON Decoding Issues

Thursday, January 19th, 2012

bug.gif

This morning I've been fighting a problem that one of my web clients has been seeing. They are hitting the greek engine and trying to calculate implied vols for given option quotes (bid/ask). When they send me a value like 25.21, it works well, but 26.00 fails. Makes no sense to me, but then again, maybe I'm missing something.

So I make a little C++ test case and it checks out nicely. So far so good. I can put in just about any value and get reasonable values out. Nice. So what's the problem?

I ask the developer to send me exactly what he's sending me, just to make sure I know what it is, and that it's formatted properly. Here's what I got:

  { "values" : { "O:AAPL:20120121:400.00:C" : { "bid.price" : 26 }},
    "instruments" : ["O:AAPL:20120121:400.00:C"] }

and the second I saw the request, I knew what it was: JSON doesn't like unnecessary data in it's encoding. THe number 26 was sent as 26 and not 26.00. The Broker then looked at this from JSON as an integer and it sent it to me as such. I was expecting a double, and so I passed on the value.

Obvious!

To clear up the problem I changed my code from:

  if (key == "bid.price") {
    freeze();
    mQuote.bid.price = dbl2int64((double)val);
  }

where val is the variant, and we're using the casting operator of the variant, to:

  if (key == "bid.price") {
    freeze();
    mQuote.bid.price = dbl2int64(toDouble(val));
  }

where:

  static inline double toDouble( const msg::ng::variant & aValue )
  {
    double    retval = NAN;
    if (aValue.isDouble()) {
      retval = (double)aVariant;
    } else if (aValue.isInteger()) {
      retval = (int64_t)aVariant;
    } else if (aValue.isString()) {
      retval = atof(((std::string &)aValue).c_str());
    }
    return retval;
  }

Now I am sure the casting is being done properly, and the user can send me integers, doubles, and even strings that format into integers or doubles, and I don't have to make them deal with it. This is the kind of bug I like to find. The solution makes it even better than before. Nice.

Got My Archive Server Working

Wednesday, January 18th, 2012

Building Great Code

Today I finally had time to devote to my archive server to get the final query form working the way I wanted. The server is really the reader part of a reader/writer pair, where the Feed Recorders are the writers of the data. The recorders are very simple little apps - they use the basic framework I've built for the exchange feed processing, and then instead of processing the datagrams, they simply buffer them and then every 30 mins or 10MB, they are written to disk in a directory structure that includes the feed name, the side, and the date. The point is that when we go to read the data, we don't want to have to look at thousands of files to get the few we want, so using directories is a very good plan.

Once these files are written, it's a simple matter of reading them and parsing the datagrams into the messages and then serving them up. By having this stored in smallish files, it makes it easy to cache these files-turned-messages in the archive server. The only key left is to make the server smart about the requests it gets.

The current format of the requests are pretty simple: feed name, side, starting and ending times, list of instruments to return, types of messages to return, and an optional sequence number. The main bulk of the requests won't use the sequence number, and that's OK - it's for special requests that I just got done finishing, but more on that later. The vast majority is really about a time range and a message type: "Give me all the Quotes for IBM from 10:30 to 10:45" - that kind of stuff. For this, the service was working pretty well.

But there was a slight hitch - what if the request was't on the filesystem? What if it was sitting in the recorder buffered up, waiting to be written out? Well… then I had to put in a scheme where the recorders were actually services themselves. The recorders would then answer a simple request: give me your data. The archive server can then see if the request is fulfilled by the filesystem data, and if not, it'll go to the appropriate recorder service, ask it, and augment the response as necessary.

It was pretty neat.

It was also pretty fast, which is nice.

The final thing I needed was to have a time/sequence number request for restarting the feeds and greek engine. Basically, if the server goes down, even if it's got a saved state, there will be some time between the last save and the time it's back up and processing messages where it's lost the data and it doesn't have any way to get it.

Enter the time/sequence number request.

When the server gets back on it's feet, it can look at the last data it has in the saved state, and then issue a request to the archive server and say "Hey, send me everything you have after this sequence number, which is about this time". Processing the returned messages means that the server will be able to catch up the lost messages, and if they aren't needed - no big deal, we'll throw them away. But if they are needed, then we have them.

Well… today I finished the archive server part. I haven't worked in the feeds and engine requesting the data, but that shouldn't be too hard. I'm in the middle of trying to get a lot of little things fixed up for the greek engine in testing, so I'm liable to hold off a bit before pushing ahead with that feature. But it feels really good to get this part done and in the can.

Google Chrome dev 18.0.1010.0 is Out

Wednesday, January 18th, 2012

This morning I saw that Google Chrome dev 18.0.1010.0 was out, and the release notes have a few nice things like PDF rotation and a few Mac-specific fixes as well. Good to see these user-experience additions and fixes, but I'm still waiting to see if Google tries to put in something that again opens up the wild development in browser technology again. Can't imagine what it'd be, but that's part of the fun!

Forgetting the Sign on the Change

Tuesday, January 17th, 2012

bug.gif

A few days ago, I wrote a few static methods on a class that could be used in the testing of changed values. They aren't anything special, but it's something that I know from experience you need to do right, or it's going to be painful, and in order to do it right, you really need to handle all the edge conditions. Normally, this would be in-line code for another developer, but I've learned that it really clutters your code to do it right. Hence, the static methods.

My first cut was:

  double Instrument::pctChange( double oldValue, double newValue )
  {
    double     chg = NAN;
    if (!::isnan(oldValue) && !::isnan(newValue)) {
      if (fabs(oldValue) < 1.0e-6) {
        chg = 1.0e6;
      } else {
        chg = (newValue - oldValue)*100.0/oldValue;
      }
    }
    return chg;
  }
 
 
  double Instrument::pctChange( int64_t oldValue, int64_t newValue )
  {
    double     chg = NAN;
    if (!::isnan(oldValue) && !::isnan(newValue)) {
      if (oldValue == 0) {
        chg = 1.0e6;
      } else {
        chg = (newValue - oldValue)*100.0/oldValue;
      }
    }
    return chg;
  }

Having one that takes doubles and another that takes int64_t is nice in that we store the floating point numbers as integers internally to make math faster and transmission far faster.

I then proceeded to use this in the test of whether or not on instrument needed to be calculated based on the changes in some values of that instrument. The code was pretty simple, but it compounded problems int he above code even more:

  double   spotChg = pctChg(res.spot, getSpotAsInt());if ((maxChg > 0.0) &&
      !::isnan(spotChg) && (spotChg < maxChg)) {}

It was pretty easy to see, when I got word that this code wasn't working, that the problem was in the loss of the sign in the test. I needed the absolute value of the percent change in the test, not just the value itself. All negative changes were being discarded because they passed this test. Not good.

So I decided to incorporate the absolute value into the method, and at the same time, fix up a few problems I saw with the methods themselves:

  double Instrument::pctChange( double oldValue, double newValue )
  {
    double     chg = NAN;
    if (!::isnan(oldValue) && !::isnan(newValue)) {
      if (fabs(newValue - oldValue) < 1.0e-8) {
        chg = 0.0;
      } else if (fabs(oldValue) < 1.0e-6) {
        chg = (newValue > 0.0 ? 1.0e6 : -1.0e6);
      } else {
        chg = (newValue - oldValue)*100.0/oldValue;
      }
    }
    return chg;
  }
 
 
  double Instrument::pctChange( int64_t oldValue, int64_t newValue )
  {
    double     chg = NAN;
    if (!::isnan(oldValue) && !::isnan(newValue)) {
      if (newValue == oldValue) {
        chg = 0.0;
      } else if (oldValue == 0) {
        chg = (newValue > 0 ? 1.0e6 : -1.0e6);
      } else {
        chg = (newValue - oldValue)*100.0/oldValue;
      }
    }
    return chg;
  }
 
 
  double Instrument::absPctChange( double oldValue, double newValue )
  {
    return fabs(pctChg(oldValue, newValue));
  }
 
 
  double Instrument::absPctChange( int64_t oldValue, int64_t newValue )
  {
    return fabs(pctChg(oldValue, newValue));
  }

and then in my code I simply used the correct form of the method:

  double   spotChg = absPctChg(res.spot, getSpotAsInt());if ((maxChg > 0.0) &&
      !::isnan(spotChg) && (spotChg < maxChg)) {}

and we should be good to go.

The method now properly detects no change - regardless of the old value being zero, and the sign of the very large change is in accordance with the sign on the new value. Also, the absolute value being backed into the other methods makes it very easy not to forget this going forward.

Fantastic Bug Find: Inclusion of Timezones in Timestamps

Friday, January 13th, 2012

bug.gif

For the last few days I've been struggling with a really nasty bug. Something that seemingly defies explanation based on the code and lack of error messages. It's been a real pain in the rump. But today I finally cracked it, and it was really fun to discover the subtle bug that had me baffled for a few days.

It all starts with timestamps. The basic idea is that I wanted a simple 64-bit unsigned integer to be a timestamp - and it seemed reasonable to pick microseconds since epoch. It all fits in the uint64_t, and it's all the resolution I'm going to need for the time being. The code to get it quickly from the system is pretty clean:

  uint64_t usecSinceEpoch()
  {
    struct timespec tv;
    clock_gettime(CLOCK_REALTIME, &tv);
    return (tv.tv_sec * 1000000 + tv.tv_nsec/1000);
  }

With this, I can get a nice, simple timestamp for comparisons. I can format it nicely with a little code like this:

  std::string formatUSecSinceMidnight( uint64_t aTime )
  {
    char   buf[64];
    // see if it's w.r.t. epoch or today - we'll format accordingly
    if (aTime > 84600000000) {
      // this is since epoch
      time_t  sec = aTime/1000000L;
      struct ™   when;
      localtime_r(&sec, &when);
      // now  make the sec since epoch for the broken out time
      sec = mktime(&when);
      // now let's make a pretty representation of those parts
      snprintf(buf, 63, "%04d-%02d-%02d %02d:%02d:%02d.%06d",
               when.tm_year+1900, when.tm_mon+1, when.tm_mday,
               when.tm_hour, when.tm_min, when.tm_sec,
               (uint32_t)(aTime - sec*1000000L));
    } else {
      // this is since midnight - let's break it down...
      uint64_t   t = aTime;
      uint8_t    hrs = t/3600000000L;
      t -= hrs*3600000000L;
      uint8_t    min = t/60000000L;
      t -= min*60000000L;
      uint8_t    sec = t/1000000L;
      t -= sec*1000000L;
      // now let's make a pretty representation of those parts
      snprintf(buf, 63, "%02d:%02d:%02d.%06d", hrs, min, sec, t);
    }
    // ...and return a nice std::string of it
    return std::string(but);
  }

And all is well and good. Timestamps made look right, they are fast, and all is good.

Almost.

There are times in the code where I want a fast way to get the date from a timestamp. I really don't care that it's with respect to epoch, I just want to know what the date is so that I can group things together based on the date. What seemed logical (to me) was to simply divide this timestamp by the number of microseconds in a day:

  uint32_t   day = aTime/86400000000;

and for the most part, it worked just fine. I could tell when I "crossed" into a new day, and everything was fine. Until 5:00pm came. Then it wasn't so nice.

Things stopped working. Certain caches were getting cleared, messages were generated but not delivered. Very odd behavior, given that just 10 min before everything was working just fine. To jump ahead a bit, the problem code looked a little like this:

  bool        skip = false;
  uint64_t    day = emsg.getCreationTimestamp()/86400000000L;
  uint64_t    eday = emsg.getTimestamp()/86400000000L;
  if ((day != eday) || (day < mCreationDay)) {
    // this is old data - skip it
    skip = true;
  } else {
    ...
  }

where the code is essentially saying "Hey, get the date the message arrived from the exchange and compare it to the date they sent it, and if they aren't the same, then drop it." Seems to be just fine. Until you take a little closer look at the value of those microseconds. See... the system call includes the timezone offset. So if I'm looking at a message that comes in at 5:00 pm (CST), that's 6:00 pm as we see it, and add in the CST offset (6 hrs), and we're in a new day!

All of a sudden, at 5:00 pm, all my new messages were being decoded, and delivered, but skipped as "bad data" simply because their "days" were different, as one had crossed into the next day. This didn't pop up until this because the localtime_r() call knowns to take out the same offset when it gets a value. My mistake was in trying to assume that I knew the value of the timestamp as opposed to dealing with it as a bit value and using the same methods to get the date.

The solution isn't too bad, but it is hard-coded:

  static uint64_t __cstOffset = 6*60*60*1000000L;
  bool            skip = false;
  uint64_t        day = (emsg.getCreationTimestamp() - __cstOffset)/86400000000L;
  uint64_t        eday = (emsg.getTimestamp() - __cstOffset)/86400000000L;
  if ((day != eday) || (day < mCreationDay)) {
    // this is old data - skip it
    skip = true;
  } else {
    ...
  }

I was so happy to find this bug as it was causing all kinds of problems in the code, and now that I've figured out the offset, it's not bad to use it as-is, it's just something I need to be mindful of when doing math on the timestamp value itself.

Sweet Little Data File Retention Script

Thursday, January 12th, 2012

Ubuntu Tux

One of the things I wanted to get done this afternoon was to clean up the bash script I have that runs every night out of cron to clean up (delete) the data files that I can't afford to keep. These are the data files for me feed recorders, and I can afford to keep a few days, but not a lot. The old code I had was really pretty lame - and it scanned the directory looking for subdirectories and looking for the highest sorted name, and picking that one as the "latest".

I did this several times, each pass getting the next one, and the next one, and in the end, I had a list of directories to "keep". Very sloppy. Here's what I came up with this afternoon after just a little bit of fiddling:

  i=0
  for d in `ls | sort -r`
  do
    # skip anything that's not a directory
    if [ ! -d ${d} ]; then
      continue
    fi
    # track count of dirs to keep - and keep 4
    i=$(($i + 1))
    if [ $i -le 4 ]; then
      continue
    fi
    # whatever is left - delete… it's too old
    echo -n '.'
    rm -rf ${d}
  done

The beauty of this script is that I can easily change the number of days to keep - just change the '4' to a '5', and we have a whole (business) week. Very nice. Also, it's a lot more compact than the original approach.