Archive for June, 2011

Adding Self-Discovery to Client Code

Wednesday, June 8th, 2011

High-Tech Greek Engine

As I'm getting ready to move into the "full-up" mode of my Greeks Service, I realize that my client library really needs to auto-discover the servers it can talk to, and certainly, the non-C++ clients need to have an easy way to find out what's hosted where. Given that I really don't want the clients to have to hit something like a mongoDB database and parse the document to get the range, it made a lot more sense to build the discovery into the services themselves. After all, they are going to have to read their configuration from the mongoDB-backed configuration service in The Broker, so it makes sense that if they are already up and going, it should be very easy to have them respond to simple requests for what they cover.

So the first thing to do was to add a "coverage" request/response to the application. This would respond with a list of two strings - the beginning of the covered range, and the end of the covered range. These are obtained from the OPRA channels that we're getting data from, and it's also the filtering criteria on what options to load in from the data master database of instrument data.

This was really simple because there's no need for any security on this call, and it's the only 'call' (aka 'one shot') that the service handles. Pretty simple. With this, the C++ client can then ask The Broker what services it knows about, filtering out based on a very simple pattern match to find the ones that are for my Greek Engine, and then ask each for their coverage. It then builds up the map and it's then easy for the client to know who to dispatch the request to. Nice.

But that's not the end of the story. I also wanted to make it possible to have a client ask a service where to go. This is a very simple service that does basically the same thing, but if we have one per Greek Engine service, then The Broker can load-balance between them, and each will know it's own engine's coverage quickly and easily, and it can repeat the process the client code uses to find out all the others. Once it's got this data cached, it's fast.

The upside is that we have a simple "piggy back" service on each Greek Engine that can locate any symbol or family very quickly. This means that if a web sockets client needs to know who to talk to, they can his this locator service, ask it, get the service name, and then talk to it. Pretty nice.

With this, I've got a pretty nice self-discovery system in place. I could add it to my Ticker Plants too, but I'm not so sure they will benefit as much from it. They broadcast on ZeroMQ reliable multicast channels, and the client is the only way to get to that data. So it's not as big a deal. But it's a nice solution, and I can think about it for later.

Making the Greek Engine More Interactive

Tuesday, June 7th, 2011

High-Tech Greek Engine

The next big thing I wanted to put in my greek service is the ability for users to not only request certain calculations, but actually provide values that are to be used in those calculations. This is taking the form of sending in a map-of-maps of the values to use, and then simply running through the maps, pulling out the values as needed.

Thankfully, I had a good part of these "maps of values" already worked out for the IRC interface. There, I needed to have some way of telling the server that I wanted it to override some values, so I had the basic parsing and handling for each instrument type in the code. I just needed to add in a few values that are traditionally output values, but in some cases, they will be input values for the 'inverse' calculation. Still... not too hard at all.

Next, I'm going to need to add the concept of a frozen state for the instruments. Basically, if I change a value, I need to be sure that it's going to stay that way while I get around to calculating the results based on that value. So I have to "freeze" the instrument, set it's value, and then calculate what I need. I also need to implement unfreeze(), to start accepting updates again. It's pretty easy, and only slightly complicated for an underlying where you might want to freeze() the underlying and all it's options, or a stock, it's options, and all it's futures, etc.

I got all that in this afternoon, and it's ready for the next step - adding the last two special calculations that the client wants. But that's for another day.

Polish, Polish, and More Polish – Finishing Touches

Tuesday, June 7th, 2011

This morning was all about polishing my app. There are a few things that I need to see get done, but they are being done by another member of the team, and I can't really jump in there without making waves, so I'm doing more of the fluff stuff in order to get things to a production quality level. It's getting close, but it's not there - yet.

One of the things I patched up was the defaulting of the calculated expiration date. One would think that the model would look at the option's expiration and figure out when it really stops trading and then properly ages the instrument. After all, US Equity options that expire on Saturday don't trade on Saturday, so why have the caller tell the model that it's really halted trading on the Friday before the expiration. No matter what I might do, this is what we needed.

So I wanted to make sure we had a decent default for the option - even though they should all be set by a database pull. I've been around long enough to know that it's only a matter of time before one option expiration isn't in the database and then that guy has a date of 0000-00-00, and the code blows up. After all, who really checks those dates before converting them? Yeah... OK... I do, but that's why I wanted a default in the first place. So I created a little method to take a date in YYYYMMDD format, break it up into a struct tm structure, call mktime() and then look at tm_wday to see if it was a weekend. If so, I backed off to the previous Friday and used that as the default.

It's not perfect, but it's a lot better than doing nothing, and even better than just assuming the expiration is the calculation date. It's little stuff like this that junior programmers don't see the value in. They often think that there has to be a database row, and if not, then it's not their fault. It's not about fault - it's about being able to predict problems and write code to at least mitigate a bit of the problem if not correct it entirely. Yeah, it takes some time, and getting yelled at, to figure out that this is the stage to go back through the code and put in all that polish that you do on furniture in the finishing phase.

It's time well-spent.

Tracking Down Nasty Boost ASIO Core Dump

Monday, June 6th, 2011

bug.gif

This afternoon I noticed that I was having a nasty, intermittent core dump on the client code for my request/response greek service. It didn't happen all the time, and it was always something in the boost ASIO code, and that required that I start digging into that code once again to see what I didn't shut down properly. It has to be one of the threads that are started in the client, and I'm just not shutting it down before bailing on the app, and then there's a thread with nothing to act on. Bam!

So I started digging and while I could require that users of the client call close() before quitting, I don't really like that limitation. I want to make code that's as robust as possible so if the client is a stack variable, and it goes out of scope for any reason, it shuts things down cleanly, and there's no reason to hassle with any shut-down code.

Unfortunately, I hadn't done a lot of that in the base client code. It worked, and my tests hadn't shown an issue, but this particular subclass was just fast enough, or just something enough to cause a timing problem, and about once every ten times, it'd core dump on the exit. Annoying.

So I put in some more shutdown code in the base class. These didn't effect the behavior of the class, it was just a lot more explicit about shutting down the connect and the channel when it needed to. It probably didn't amount to more than 15 lines of code in the base class. But an important 15 lines.

Looks better now.

[6/7 5:30pm] UPDATE: Added one more thing - it seems that I still get the problem after hours. Why? No idea, but it dawned on me that in my destructor for the client class, I'm stopping the boost ASIO io_service thread. And if I'm faster than the stoppage, I'll get into trouble. So I added in a simple join() on that thread after telling it to die, and that made things much better after-hours. Good.

Fleshed Out Features for Request/Response Greek Service

Monday, June 6th, 2011

Today I spent quite a bit of the day working on adding a lot of necessary features to my request/response greek engine that I've been working on for the last several weeks. The initial cut of the code had limited features - just to get it out there and working for the clients. Now it was time to really expand the feature set to make it a lot more useful for the clients. This included allowing multiple stock families in the subscription to the service, and properly implementing the filtering in the calculation messages. These features weren't critical to getting something going, but they are going to be critical to efficiently handling a lot of requests.

The other thing I did was to monitor the load on the box while I jacked up the slice of the universe on it in hopes of getting the number of boxes needed to cover the universe to as small a set as possible. I'm hoping to be able to get it all done in two boxes - but that's going to mean monitoring for a few days to make sure there's sufficient head room on the box during the open and close to not loose anything.

Nice and productive day. It's a lot less stress than the last few weeks have been.

Google Chrome dev 13.0.782.10 is Out

Monday, June 6th, 2011

Google Chrome

This morning I saw that Google Chrome dev 13.0.782.10 was out with an updated version of Flash included. Since it's the only Flash player I have on by system, it's fine that it's at least contained there. I just don't like Flash all that much. I've also read that there was a Flash Player update from Adobe recently due to some security problems, and I'm not exactly sure if this guy is that patched version, or if this is one version behind. In any case, it's the update and I've got it now.

[6/8] UPDATE: that didn't last long... 13.0.782.11 is out already with a very minimal set of release notes. From the comments, it's not clear that this was really about UI tweaks, as much as there was a real problem in one of the builds and this was a quick fix for that guy. Who knows.

[6/9] UPDATE: neither did that... 13.0.782.13 is out today and it's got the same release notes about UI tweaks and stability fixes. I guess it's nice that chrome remembers my open tabs so it's painless to restart.

The Problem with Partially Implemented Features

Friday, June 3rd, 2011

bug.gif

This afternoon, while the Developer Days was chugging along nicely, I decided to try and fix a bug I was seeing in the request/response service. I'd connect and subscribe to a family, and then get the first set of calculation results, but then after that, nothing. Initially, I wasn't getting the complete family the first time due to a problem with the isDirty() status not being set properly on cloned families. I fixed that guy pretty easily, but this was a little odd. During the day when we should have plenty of ticks, we weren't seeing updates. Made no sense. So I started digging into the code.

Turns out, it took me about 10 mins to realize that the isDirty() was not being set on any updates. It was there, but it wasn't really working. Not even close. This is one of the really nasty problems I see with some developers: they think that putting in the skeleton of a feature is enough to stop on, and they go on to something else without really effectively bookmarking the work, so they are sure to get back to it.

This was a pretty easy feature to do. I'm guessing it was no more than 15 mins to complete. But it was a critical 15 mins, and to have skipped this for several weeks is really pretty bad. You have to be able to bookmark your work - either notes in the code, like "TODO:" marks, or maybe a notepad, or something, so that you can remember what's been done, what hasn't, and what needs to be done. It's kinda frustrating, but at the same time, I have to remember who wrote the code, and cut them some slack.

Just got to get it all out of their hands to get the quality up.

Threading Issues with Boost ASIO async_write()

Friday, June 3rd, 2011

Boost C++ Libraries

This morning it's Developer Day, and I'm fighting a problem that really had me stumped for a while. When I have two independent requests hit my service, if they are close enough in time, they can cause a real problem in the service. I seem to get the requests just fine. I'm able to generate the results for each request just fine, but when I try to send back the generated responses to the client, it gets the service and the client into an unstable state.

Not good.

When I try to reproduce the test using my C++ client, I can't seem to get the requests close enough in time to trigger the problem. Or maybe I have a better client-side socket driver. Hard to say. But someone generated a test case in Python, and that does the trick. The client doesn't appear to ever get the response, but I know I sent it. Not a fun place to be when people are supposed to be hitting your stuff for fun and enjoyment.

I got the idea that it was in the boost ASIO sending because the generation was right, the C++ client was right, but the Python wasn't. The original code looked like this:

  void MMDServiceHandler::asyncSendMessage( const std::string & aString )
  {
    bool        error = false;
 
    // first, make sure we have something to do
    if (!error && ((mSocket == NULL) || !mSocket->is_open())) {
      error = true;
      cLog.warn("[asyncSendMessage] no socket to send on! Bad news");
    }
 
    /**
     * Now let's put the preamble and message into a buffer so that the
     * MMD knows what to do with it...
     */
    if (!error) {
      // let's make a few pointers for the payload
      boost::shared_ptr<Preamble>     hdr(new Preamble(aString));
      boost::shared_ptr<std::string>  body(new std::string(aString));
      // now let's make a package of the header and buffer
      std::vector<boost::asio::const_buffer>  pkg;
      pkg.push_back(boost::asio::buffer(hdr->body(), hdr->size());
      pkg.push_back(boost::asio::buffer(*body));
      // finally, send it out asynchronously
      boost::asio::async_write(*mSocket, pkg,
                  boost::bind(&MMDServiceHandler::asyncMessageSent, this,
                          hdr, body,
                          boost::asio::placeholders::error,
                          boost::asio::placeholders::bytes_transferred));
    }
  }

My theory this morning was that by placing two items in the boost package, I was allowing boost to think that they could be sent separately. With two such writes done in quick succession, it's possible to think that the four components got interleaved and so it messed up the receiver, and then the sender as it was waiting for something back from the client, and it was never going to come.

What I decided to go with was a change to a single buffer:

  void MMDServiceHandler::asyncSendMessage( const std::string & aString )
  {
    bool        error = false;
 
    // first, make sure we have something to do
    if (!error && ((mSocket == NULL) || !mSocket->is_open())) {
      error = true;
      cLog.warn("[asyncSendMessage] no socket to send on! Bad news");
    }
 
    /**
     * Now let's put the preamble and message into a buffer so that the
     * MMD knows what to do with it...
     */
    if (!error) {
      // let's make a header and a pointer for the body
      Preamble    hdr(aString);
      boost::shared_ptr<std::vector<char> >  body(
                  new std::vector<char>(hdr.size() + aString.size()));
      // now let's put everything into one contiguous piece...
      memcpy(&((*body)[0]), hdr.body(), hdr.size());
      memcpy(&((*body)[0]) + hdr.size(), aString.data(), aString.size());
      // finally, send it out asynchronously
      boost::asio::async_write(*mSocket, boost::asio::buffer(*body),
                  boost::bind(&MMDServiceHandler::asyncMessageSent, this,
                          body,
                          boost::asio::placeholders::error,
                          boost::asio::placeholders::bytes_transferred));
    }
  }

By placing it all into one std::vector, using jsut two memcpy() calls, I was thinking I was even making it a little faster. The data was still in a stable shared pointer that was going to have a reference until the asyncMessageSent() method was called, and then it'd be dropped. So I was thinking I had this bad boy licked.

Yeah... well... not so much.

So the next thing I went to was switching from using async_write() to just using write(). Thankfully, I already had a syncSendMessage() method for just such an occasion:

  void MMDServiceHandler::syncSendMessage( const std::string & aString )
  {
    bool        error = false;
 
    // first, make sure we have something to do
    if (!error && ((mSocket == NULL) || !mSocket->is_open())) {
      error = true;
      cLog.warn("[syncSendMessage] no socket to send on! Bad news");
    }
 
    /**
     * Now let's put the preamble and message into a buffer so that the
     * MMD knows what to do with it...
     */
    if (!error) {
      // let's make a header and a pointer for the body
      Preamble    hdr(aString);
      std::vector<char>  body(hdr.size() + aString.size());
      // now let's put everything into one contiguous piece...
      memcpy(&body[0], hdr.body(), hdr.size());
      memcpy(&body[0] + hdr.size(), aString.data(), aString.size());
      // finally, send it out synchronously
      boost::system::error_code  err;
      boost::asio::write(*mSocket, boost::asio::buffer(body),
              boost::asio::transfer_all(), err);
      if (err) {
        error = true;
        cLog.warn("[syncSendMessage] unable to send the message!");
      }
    }
 
    return !error;
  }

When I switched to the syncSendMessage(), everything worked out. I didn't like that I had to synchronously write out the data, so I put the entire send method in a simple thread that does the send and then dies. Sure, I could have put it in a pool, but this is good enough for today, and we'll wait and see what the real limitations are. But for now, it's been a really hard, but good, morning and I'm ready to just sit back and help the other guys with their Developer Days projects.

Cleaning Up More Code – I Need a Bigger Bucket

Thursday, June 2nd, 2011

Code Clean Up

Today I'm slugging through the clean-up of the last bit of work done by others on this project of mine, and while it's not as bad as some of the other code I've had to clean up, there's still a considerable bunch of problems in the way a lot of things were done. For example, our internal representation of a date is a single unsigned integer value of the form YYYYMMDD. It's simple, easy to compare, and very easy to read by a human. However, for the calculation model we need to use, they have a simple structure with three parts: year, month, day. Each is a simple integer, and while I can see why they might want to do that, they don't have any of the supporting code to make it easy to do comparisons, etc. None of this really matters because that's just what we have to do. So be it.

But the problem I saw in the code was that there was a method to convert from our representation to theirs, and it was written as:

  void Calculator::getDate( uint32_t aDate, uint32_t & aYear,
                            uint32_t & aMonth, uint32_t & aDay)
  {
    aYear = aDate / 10000;
    aMonth = (aDate - aYear * 10000)/100;
    aDay = (aDate - aYear * 10000 - aMonth * 100);
  }

and in the code using this method was called:

  getDate(expir, model.year, model.month, model.day);

and there was at least one real problem with this, and one really bad design choice. First, let's look at the problem. The date structure is defined as:

  struct FEDate {
    int day;
    int month;
    int year;
  };

and from clever inspection, we're mapping these signed integers to the unsigned integers of the method. Well... the code really looks like:

  getDate(expir, (uint32_t &)model.year, (uint32_t &)model.month,
          (uint32_t &)model.day);

when it's not a really great idea to say the references are to unsigned values when they are going to be used as signed ones. The horrible design issue is that there's never a case we need to have this as the separate components - we're much better off dealing with the conversion to the struct directly:

  void Calculator::fillDate( uint32_t aDate, FEDate & aTarget )
  {
    aTarget.year = aDate / 10000;
    aTarget.month = (aDate - aTarget.year * 10000)/100;
    aTarget.day = (aDate - aTarget.year * 10000 - aTarget.month * 100);
  }

Now we don't have to worry about the conversion issue at all. Plus, it's clear that we're converting from our representation to theirs. Simple. Clean. Direct. Much better.

I'm going to have to go through a lot of the code like this, finding little things that are still causing memory leaks - and there are at least a few here. I need to get them all cleaned out as quickly as possible as we have the Developer Day tomorrow, and I'd really like to have this all done today.

SubEthaEdit 3.5.4 is Out

Thursday, June 2nd, 2011

subethaedit.jpg

This morning I decided to check and see if SubEthaEdit had been updated and sure enough, 3.5.4 was out with what appears to be just a single change: Mac OS X Lion fixes. Shucks. There were a few things that I had hoped they'd put into a new release, but I have to face facts: it's exactly what they want it to be. No big changes planned. I'm going to have to accept that this is "it".

I'm not really shocked. It is a great editor. It's just not going to beat BBEdit and MacVim. It has "jump scrolling", and that's too abrupt for me. It's method locator is nice, but it's not as nice as BBEdit's. It's find uses the nice Safari-style "jump balloon", but BBEdit is close and Vim isn't bad for a cross-platform tool.

I read an article recently where someone went through all this on the Mac and said (basically): "No free lunch. BBEdit, MacVim, Emacs - learn one." I laughed a lot at the article, and myself, because he's right on the money. If you want a really powerful editor then these are your choices. Live with it. I guess I was hoping for the SubEthaEdit upstart, but that's not going to happen. It is what it is, and they are happy with it. Great. Back to BBEdit and MacVim.