Archive for the ‘Coding’ Category

Finally Polished Off the Query/Subscription System

Friday, October 1st, 2010

This morning I finally finished off the query/subscription system for my ticker plant and it's client. The finishing touches are all in the details - of course, and some of the Broker changes of yesterday are biting me in ways I hadn't expected. Seemingly working then not working. It helps to just get a fresh pull of the code and make clean. No doubt.

The nice thing is that it's working just like I had hoped - it's pretty darn fast, and while I need a lot of data to support this configuration, it's going to be very nice to get it all loaded up and rolling in a solid, complete, dev environment. But there's a lot to go before that's ready.

I'm pretty pleased that the queries are fast. They don't have to be especially speedy, but it's nice that they are using the lockless data structures and we're using the boost asio library to get it from the Broker and it's service (the ticker plant). Not bad, I have to say.

Google Chrome dev 7.0.536.2 is Out

Friday, October 1st, 2010

GoogleChrome.jpg

This morning I noticed that Google Chrome dev 7.0.536.2 was out, and the release notes look a little better than the last few updates:

The Dev channel has been updated to 7.0.536.2 for Windows, Mac, Linux and Chrome Frame

All

  • Fixed saving passwords containing non-ASCII characters (Issue 54065).
  • Accelerated compositing and support for 3D CSS transforms enabled by default (Issue 54469)
  • WebGL support enabled by default (Issue 54469)
  • Regression fix: keep the download shelf visible when multiple sites are saved. (Issue 54149)
  • Add a lab for the Page Info Bubble for Windows and Linux; Mac coming shortly.

Mac

  • More keyboard shortcuts for Tab Overview.(Issue 52834)
  • Add sqlite and javascript memory columns to task manager

So it looks like they are back on getting new features into the code - as opposed to security fixes, etc. Nice to see. In a related note, I also saw that they took the 'beta' channel to 7.0.517.24 which is the first time 'beta' has seen a 7.x release. Nice confidence in the code, guys. Keep it up.

Shifting Target Once Again

Thursday, September 30th, 2010

Ringmaster

Today did a git pull on the Broker that another guy had done a significant re-write on, and ended up spending the bulk of the day changing my code to update it so that the broker client and service parts are working again. It's not ideal, but I can understand all the reasons for the changes, and it's nice to see these kind of things happening. Every system gets better with a re-write - by talented people, and so the system, it's protocol, and capabilities are just getting better.

The consequence is that I have to spend a day doing a pretty significant re-write on my own. Thankfully, my code is pretty clean, but there were still a lot of changes due to the way the protocol changed. In the end, I found a few little issues with the Broker, but those should be easy for him to fix.

Tomorrow is back to testing the ticker plant query system.

C++ Cast Operator Overloading – Watch Out for const

Thursday, September 30th, 2010

Professor.jpg

I spent far too much time on this problem today, but I've finally gotten it solved, and it's worthy of writing up. No question about it. The problem is in the variant class and the casting operators, or more properly conversion operators, that I put in place for the class.

To start out, it's a simple variant where we have a simple union as the ivar, and as the value type of the variant changes, the different components of the union are set. Pretty standard. What I wanted was to be able to use simple cast operators to get the values out of the variant:

  // set the value to an int
  variant   v = 10;
 
  // use the value
  count += (int) v;

and all was going pretty well when I had the operators for this variant class defined as:

  operator varmap &() const;
  operator varmap *() const;
  operator varlist &() const;
  operator varlist *() const;
  operator int() const;
  operator int64_t() const;
  operator int64_t &();
  operator float() const;
  operator double() const;
  operator double &();
  operator std::string &() const;
  operator std::string *() const;
  operator uuid_t &() const;
  operator uuid_t *() const;
  operator bytes_t &() const;
  operator bytes_t *() const;
  operator error_t &() const;
  operator error_t *() const;

but when I added:

  operator uint8_t() const;
  operator uint8_t &();

things really started to fall apart. Specifically, with the additional casting for (uint8_t), I got compiler warnings about:

  // use the value
  value = (int) v;

saying the compiler could not figure out which one of the casting operators to use. There were several - both references and not. Very confusing. If I tried this:

  // use the value
  value = (int64_t) v;

everything worked fine. Very odd. But I thought Hey, it's clearer to have the size in there anyway, let's just get rid of the problem operator. But that's never the real end of the problem, is it?

I next had the problem with:

  // use the value
  value = (float) v;

Same thing. SO I really had to solve this. Bummer.

I spent about 90 mins trying all kinds of things, only to be blown away at the real solution: It's the const-ness of the casting operators. Change that, and it's all OK:

  1. operator varmap &() const;
  2. operator varmap *() const;
  3. operator varlist &() const;
  4. operator varlist *() const;
  5. operator int();
  6. operator int64_t() const;
  7. operator int64_t &();
  8. operator uint8_t() const;
  9. operator uint8_t &();
  10. operator float();
  11. operator double() const;
  12. operator double &();
  13. operator std::string &() const;
  14. operator std::string *() const;
  15. operator uuid_t &() const;
  16. operator uuid_t *() const;
  17. operator bytes_t &() const;
  18. operator bytes_t *() const;
  19. operator error_t &() const;
  20. operator error_t *() const;

Note lines 5 and 10 - no const. This allowed the compiler to figure out what it needed and I could put back in the (int) cast operator. What I'm guessing is that the (int) and (float) are really methods that create values as opposed to simply returning references or pointers to the members of the union. As such, making it const was not quite in line with what was happening. By allowing it to be returned as-is, the compiler was happier.

At least that's how I'm figuring it.

Building Query into Subscription of Ticker Plant

Wednesday, September 29th, 2010

Today was spent integrating the query capabilities from the Broker and ticker plant into the ticker plant client. The simple subscription is working fine, but when a client really subscribes to the ticker feed, they need to be given any of the messages that are cached on the ticker plants so that if there's a message out there to be seen, the subscriber will see it. Great for off-hours. Anyway, I needed to build this into the system by first putting in the plumbing for the second source of messages - the 'query', and then putting in the capability into the ticker plant to service these queries.

I got a good 90% of the code written today and tomorrow I'll be able to finish it and start testing. Lots of hard coding today.

Google Chrome dev 7.0.517.24 is Out

Wednesday, September 29th, 2010

I noticed this morning that Google Chrome dev is now at 7.0.517.24 and while to sum total of the release notes is:

This release focused on resolving minor bug fixes or crashes. More details about additional changes are available in the svn log of all revisions.

it's something that makes sense upgrading. I'm just curious when the Mac client is going to get the hardware acceleration that they are putting into the Windows version? It'd be nice, but it's not bad now... just would love to see more widespread use of the GPU in software... it's a great untapped resource.

Merging Live and Historical Message Streams

Tuesday, September 28th, 2010

GeneralDev.jpg

Today I've been working on the problem of merging the current tick data stream with the last known ticks for a given message type and instrument. The reason for the merge is that when a user of my ticker plant client (TPClient) subscribes to a certain set of messages, we need to be able to provide him with the last known versions of those messages as well. If the market is closed, this represents his only data. If the market is ticking, then chances are he's going to get a more recent message, which is why we have to properly filter them out.

I've got the filtering working out because it's pretty easy - look at the type of the message, and then have a simple std::map of conflation key to timestamp. If a message comes in, and we have filtering turned 'on', then check to see the last timestamp we saw for a message like this. If we have a newer one, send it. If not, don't. If there's no filtering, then send it regardless.

This allows the client to decide if they want to see the messages, and filter them on their end, or allow the TPClient to handle it for them. It's all pretty simple.

What's proving to be a touch more difficult is the providing of the messages for filtering. It makes most sense to have the messages pulled from the cache when a subscription is done. Also, it makes sense to have the criteria be the same. It's also been established that this request will be a 'call' to an MMD Service that will be the QuickCache within the ticker plant's exchange feeder. What's not so clear to me now is the way I'm going to be able to map the request to the MMD service, and how I'm going to get that data back into the TPClient's filter's onMessage() method. But I know I have to do just that.

I'll tackle that tomorrow.

Limitations on the STL std::map and Finding Keys

Tuesday, September 28th, 2010

I've been working with the STL std::map for quite a while, but recently I use it as the core of a data structure where I wasn't finding an exact match - I was looking for a "lower bound" on the key to see where I needed to start looking for a match to the data. It's probably easier to start from the beginning. The data I'm dealing with is a std::map where the key is a uint32_t and the value is a boost::tuple of a uint32_t, another uint32_t, and a std::string. Like this:

  typedef boost::tuple<uint32_t, uint32_t, std::string> Channel;
  typedef std::map<uint32_t, Channel> ChannelMap;

Where the data looks something like this:

Key Value
0x00000000 0x00000000, 0x01ffffff, "first"
0x02000000 0x02000000, 0x02ffffff, "second"
0x03000000 0x03000000, 0x03ffffff, "third"
0x04000000 0x04000000, 0x04ffffff, "fourth"

where the 'key' is the first value of the tuple, and the two numeric values in the tuple form an arithmetic range for a uint32_t. What I need to do is to take an arbitrary uint32_t value and find the string that it fits with. If it's less then the least range there's nothing, and if it's greater than the last, it's nothing.

The problem was that I was using STL's lower_bound() and assuming that it was going to give me what I thought was the "lower bound" of the value in the keyspace. But it doesn't. What lower_bound() returns is:

Finds the first element whose key is not less than the argument

Simply put, it find the key whose value is greater than or equal to the argument. So this is almost the "upper bound" in my book. But there's more.

The upper_bound() method returns:

Finds the first element whose key greater than the argument

which is no better. What I wanted was something like: the largest key less than or equal to the argument. Put that way, I'm not really surprised that I didn't find it. So how to I make it out of the methods I have?

What I needed to do was to look at values of both of these functions and try to make some sense of it. So I made the following test code:

  #include <iostream>
  #include <string>
  #include <map>
  #include <stdint.h>
 
  int main(int argc, char *argv[]) {
    std::map<uint32_t, uint32_t>   m;
    for (uint32_t i = 10; i <= 100; i += 10) {
      m[i] = i + 1;
    }
    std::map<uint32_t, uint32_t>::iterator    it;
    // print out the entire map
    for (it = m.begin(); it != m.end(); ++it) {
      std::cout << "m[" << it->first << "] = " << it->second << std::endl;
    }
    // now check the lower_bound and upper_bound methods
    it = m.lower_bound(45);
    std::cout << "lower_bound(45): " << it->first << " == " << it->second << std::endl;
    it = m.upper_bound(45);
    std::cout << "upper_bound(45): " << it->first << " == " << it->second << std::endl;
 
    it = m.lower_bound(50);
    std::cout << "lower_bound(50): " << it->first << " == " << it->second << std::endl;
    it = m.upper_bound(50);
    std::cout << "upper_bound(50): " << it->first << " == " << it->second << std::endl;
 
    it = m.upper_bound(45);
    --it;
    std::cout << "--upper_bound(45): " << it->first << " == " << it->second << std::endl;
    it = m.upper_bound(50);
    --it;
    std::cout << "--upper_bound(50): " << it->first << " == " << it->second << std::endl;
 
    return 0;
  }

which returns:

  m[10] = 11
  m[20] = 21
  m[30] = 31
  m[40] = 41
  m[50] = 51
  m[60] = 61
  m[70] = 71
  m[80] = 81
  m[90] = 91
  m[100] = 101
  lower_bound(45): 50 == 51
  upper_bound(45): 50 == 51
  lower_bound(50): 50 == 51
  upper_bound(50): 60 == 61

From this, it seems like the two really aren't all that different - and they aren't. What's important to see, though, is that really the definition of greatest less than or equal to is something like one less than the one just greater. With that, I tried using the upper_bound() and then backing off one:

  #include <iostream>
  #include <string>
  #include <map>
  #include <stdint.h>
 
  int main(int argc, char *argv[]) {
    std::map<uint32_t, uint32_t>   m;
    for (uint32_t i = 10; i <= 100; i += 10) {
      m[i] = i + 1;
    }
    std::map<uint32_t, uint32_t>::iterator    it;
    // print out the entire map
    for (it = m.begin(); it != m.end(); ++it) {
      std::cout << "m[" << it->first << "] = " << it->second << std::endl;
    }
    // now check the lower_bound and upper_bound methods
    it = m.lower_bound(45);
    std::cout << "lower_bound(45): " << it->first << " == " << it->second << std::endl;
    it = m.upper_bound(45);
    std::cout << "upper_bound(45): " << it->first << " == " << it->second << std::endl;
 
    it = m.lower_bound(50);
    std::cout << "lower_bound(50): " << it->first << " == " << it->second << std::endl;
    it = m.upper_bound(50);
    std::cout << "upper_bound(50): " << it->first << " == " << it->second << std::endl;
 
    it = m.upper_bound(45);
    --it;
    std::cout << "--upper_bound(45): " << it->first << " == " << it->second << std::endl;
    it = m.upper_bound(50);
    --it;
    std::cout << "--upper_bound(50): " << it->first << " == " << it->second << std::endl;
 
    return 0;
  }

which returns:

  m[10] = 11
  m[20] = 21
  m[30] = 31
  m[40] = 41
  m[50] = 51
  m[60] = 61
  m[70] = 71
  m[80] = 81
  m[90] = 91
  m[100] = 101
  lower_bound(45): 50 == 51
  upper_bound(45): 50 == 51
  lower_bound(50): 50 == 51
  upper_bound(50): 60 == 61
  --upper_bound(45): 40 == 41
  --upper_bound(50): 50 == 51

If I was careful and checked for the limits, I think I'd have something. So that's exactly what I did.

My final code for finding the string in the tuple looks something like this:

  const std::string ZMQChannelMapper::getURL( const MessageMapCode aCode )
  {
    std::string     url;
 
    if (!mChannelMap.empty()) {
      ChannelMap::iterator  itr;
      if (mChannelMap.size() == 1) {
        // if there's only one... try it - we might get lucky
        itr = mChannelMap.begin();
      } else {
        // find the high-end of the enclosing range in the table
        itr = mChannelMap.upper_bound(aCode);
        // if it's not at the ends, back off one to the start
        if ((itr != mChannelMap.begin()) && (itr != mChannelMap.end())) {
          --itr;
        }
      }
      // from this starting point, check the range for a match
      if (itr != mChannelMap.end()) {
        Channel & tupleInfo = (*itr).second;
        if ((tupleInfo.get<0>() <= aCode) &&
            (aCode <= tupleInfo.get<1>())) {
          url.append(getBaseURL());
          url.append(tupleInfo.get<2>());
        }
      }
    }
 
    // all done - return what we have
    return url;
  }

It works, and it's OK, but it's clear why they didn't make something like this in STL - far too specialized. But I have it now.

Finished up the Cache Data Service for Ticker Plant

Monday, September 27th, 2010

Ringmaster

Today was a great day for making progress on publishing the cache through the data service. In fact, I got it all done. It's pretty slick and should fit into the other data services of the broker nicely.

I create a subclass of the MMDServiceHandler which is spawned off of the MMDService for each call to bind() that the MMDService receives. It's a classic controller/worker breakdown where I've put a lot of the smarts of the workers in the abstract base class - MMDServiceHandler, and then create subclasses for the 'basic' data handler and the 'cache' handler. I'd tackled the first one earlier, so today it was time to hit the latter.

The cache is a lockless single-producer, multiple-consumer cache of the last tick message the feed has produced for that message type and a particular conflation key to make sure we keep "unique" values, but not duplicates of those "unique" values. It's standard stuff for a ticker plant, the point is that we need to scan the cache - no two ways about it.

So I created the different query schemes and then implemented them - realizing that we don't have to worry as much about performance here as this is going through the Broker, and because of that it's meant for the slower data consumers. In that our big concern is not to slow down the feed by locking anything. Good enough.

I found out that I needed to return the data in two ways: objects and a map of ivar names/values for the cross-platform crowd. To accomplish this, I had to add a getMapData() method to all the messages and build it up from the bottom. It just took a little time, but in the end it's a solid way of allowing easier access to the Java and Python clients as they all have maps and I don't have to worry about making real objects of the messages. (Note: I am making them in Java, but it's nice not to have to mess with the Python client)

The last thing I had to do was to glue this service into the Ticker Plant such that the feed exposed it's cache so the service could bind() it to the Broker. Not bad at all, and very slick. Really nice day today.

Running Some Initial Tests on Ticker Plant

Monday, September 27th, 2010

Today I needed to run some tests on my ticker plant code to get a decent scale for ordering hardware and network taps. The new machines are going to need double 10Gb ethernet cards - one for the incoming feeds from the exchanges and the other for the outgoing packets. Right now I don't split it up like that, but I know I'll need to in the real UAT testing. But today I just wanted to get as close to "real" as possible - given that I don't have a lot of the supporting data sources I'm going to need before UAT.

The big missing data source is the mapping of the exchange symbol to "security ID" - an internal unsigned integer that is generated in the database and used for all references to an instrument. I'm expecting that it'll be a simple data service where I'll open up a subscription channel to the data service and issue calls to map the symbols to security IDs. I have the code to map these (in both directions) so it's only necessary to get this data once, but I need a source of this data.

Well... not really. For these tests all I need is a unique ID for these guys. So let's make them up. Easy. I'll implement the "lookup" method on the class to generate a uuid_t, which is a random 128-bit number, and use the first 64-bits as the "security ID". I'll pass this back to the mapping method and it'll think it's real. For the sake of these tests, it's good enough as we just need to have these in order to check on the conflation of the data stream.

When I fired it up running on 1/24th of the OPRA feed it used under 10% of one CPU. A full-tilt feed using less than 10%! You gotta be kidding! I watched it for a while and if you toss out the CPU usage of the terminal that is streaming the log data, it's well under 10%. From time to time it spikes to 40% - not quite sure what that's all about, but it's not even 1% of the time.

If we assume we can get 4 channels on one CPU, and factor up the memory, it looks like we can get the complete OPRA feed on one 8 CPU box with 32 GB RAM. If we add another for all the remaining feeds, which is a good estimate, we're fitting the complete ticker plant into two small boxes. Pretty wild.

Past wild... that's better than I'd have ever believed. Sweet.