Archive for the ‘Coding’ Category

Converting Doubles and Floats in C++ for Java

Tuesday, November 9th, 2010

cplusplus.jpg

My C++ ticker plant needs to communicate with Java, and it's unfortunately the case that the host-order versus network-order comes up in these situations as well. Thankfully, for virtually all our integers (signed and unsigned) we're using Google's VarInt encoding with the ZigZag encoding for the signed integers. It's very nice as it completely removes any byte ordering issues from the cross-language API.

But doubles and floats are another issue.

In order to make them fast on the Java decoding side (the majority of my clients will be Java-based) we really need to make the encoding in the messages Java-focused. This means that when the data arrives in the ByteBuffer, a simple getFloat() gets the value out correctly.

What this means is that I need to run htonl() on the float before it's packed into the data stream, and for a double, I need to do that, and then reverse the order of the two uint32_t values to properly reverse the byte ordering. I decided to start with the simplest method: brute force. Assuming the double to convert is in the variable gold, this code bit reverses the bytes and puts them in the std::string buffer for sending to the clients:

    // now flip it's bytes
    uint8_t    in[8];
    memcpy(in, &gold, 8);
    uint8_t    out[8];
    for (uint8_t i = 0; i < 8; ++i) {
      out[i] = in[7 - i];
    }
    conduit.append((const char *)out, 8);

And getting it back out of the std::string as it's read in from the upstream system is simply:

    uint8_t    in[8];
    uint8_t    out[8];
    memcpy(in, conduit.data(), 8);
    for (uint8_t i = 0; i < 8; ++i) {
      out[i] = in[7 - i];
    }
    memcpy(&gold, out, 8);

Very deliberate, but not very efficient. But it worked. Once I had that in the code, I started working on coming up with better methods of doing this conversion, and timing them to see what the real difference was. To keep the playing field level, this is the test that I wrote for the first version of the conversion - out and back, 100,000 times:

  // start the timer
  uint64_t    start = getTime();
  // do the looping a bunch of times
  for (uint32_t l = 0; l < 100000; ++l) {
    // clear out the conduit
    conduit.clear();
 
    // now flip it's bytes
    uint8_t    in[8];
    memcpy(in, &gold, 8);
    uint8_t    out[8];
    for (uint8_t i = 0; i < 8; ++i) {
      out[i] = in[7 - i];
    }
    conduit.append((const char *)out, 8);
 
    // now back again
    memcpy(in, conduit.data(), 8);
    for (uint8_t i = 0; i < 8; ++i) {
      out[i] = in[7 - i];
    }
    memcpy(&gold, out, 8);
  }
  uint64_t    loopTime = getTime() - start;
  std::cout << "100000 'loop' passes took " << loopTime
            << " usec or " << 100000.0/loopTime << " trips/usec"
            << std::endl;
  std::cout << "gold: " << gold << std::endl;

And the result of this running on my MacBook Pro was:

  peabody{drbob}465: a.out
  100000 'loop' passes took 11048 usec or 9.05141 trips/usec
  gold: 254.76

I then started fiddling with the code. The first thing I wanted to try was using htonl() and see if it was any faster than simply throwing down bytes in the correct order. Turns out, it was. Nice. My next test was faster:

  // let's see if we can use the ntohl/htonl for better speed
  start = getTime();
  // do the looping a bunch of times
  for (uint32_t l = 0; l < 100000; ++l) {
    // clear out the conduit
    conduit.clear();
 
    uint32_t	in[3];
    memcpy(&in[1], &gold, 8);
    in[0] = htonl(in[2]);
    in[1] = htonl(in[1]);
    conduit.append((const char *)in, 8);
 
    uint32_t	out[3];
    memcpy(&out[1], conduit.data(), 8);
    out[0] = ntohl(out[2]);
    out[1] = ntohl(out[1]);
    memcpy(&gold, &out[0], 8);
  }
  loopTime = getTime() - start;
  std::cout << "100000 'ntoh/hton' passes took " << loopTime
            << " usec or " << 100000.0/loopTime << " trips/usec"
            << std::endl;
  std::cout << "gold: " << gold << std::endl;

and while I was still using memcpy(), I was now working with larger chunks of data, and the speed proved out:

  100000 'ntoh/hton' passes took 5992 usec or 16.6889 trips/usec
  gold: 254.76

But I wanted to get away from the memcpy() calls altogether. If I was a little careful with the way I did things, it worked out just fine:

  // let's see if we can use the ntohl/htonl for better speed
  start = getTime();
  // do the looping a bunch of times
  for (uint32_t l = 0; l < 100000; ++l) {
    // clear out the conduit
    conduit.clear();
 
    uint32_t  *in = (uint32_t *)(void *)(&gold);
    uint32_t  buff[] = { htonl(in[1]), htonl(in[0]) };
    conduit.append((const char *)buff, 8);
 
    uint32_t  *out = (uint32_t *)conduit.data();
    uint32_t  *target = (uint32_t *)&gold;
    target[0] = ntohl(out[1]);
    target[1] = ntohl(out[0]);
  }
  loopTime = getTime() - start;
  std::cout << "100000 'string mix II' passes took " << loopTime
            << " usec or " << 100000.0/loopTime << " trips/usec"
            << std::endl;
  std::cout << "gold: " << gold << std::endl;

This guy clocked in nicely:

  100000 'string mix II' passes took 5291 usec or 18.9 trips/usec
  gold: 254.76

Lesson learned: Get it right, then get it fast. Always pays off. I've more than doubled the speed and it's the same interface to the outside world. Even if we look at the code, there's very little I can really trim out - even if I didn't do the byte re-ordering. It's just about down to the extra time in the htonl() calls. And that's really nice.

Google Chrome dev 9.0.572.1 is Out

Tuesday, November 9th, 2010

This morning Google Chrome dev was updated to 9.0.572.1 with a new version of Flash. OK... seems fair, and as long as I don't have to have any other Flash in my system, I'm OK with that.

Putting it All Together for an NBBO Ticker Plant

Friday, November 5th, 2010

Well... it's been a heck of a week, but I'm very happy that I have an NBBO Ticker Plant in the can - ready for testing on Monday. It took a long three days, and one of those days was devoted almost entirely to the persistence of the exchange data and message cache. The testing for that proved very helpful as I caught a few nasty copy-paste bugs in the data elements.

What I'm hoping is that with the NBBO Plant being fed from more than 25 exchange feeds, it'll still be able to keep up with the flow - because none of these feeds are option feeds. We'll have to wait for Monday to see, but I'm optimistic.

If not, I'll have to figure out how to speed things up, or possibly put in conflation queues between the exchange feeds and the NBBO Engine.

Proper Unit Testing is Valuable – But Painful

Friday, November 5th, 2010

GeneralDev.jpg

One of the things I really think is important in large systems work is proper unit testing. Now I'm not talking about the "Test First" idea, I'm talking about building some code - a class in C++ or Java, and it contains something other than mindless data containers, and those methods needing testing. If it's a cache, then throwing things in, counting them, and getting them out again. That kind of testing. Stuff where you need to be certain that the code is working before you go on.

The problem with a lot of this testing is the same thing you see in testing ICs - test vectors often require additional instrumentation on the class in order to test things properly. For example, I was testing parts of my NBBO engine where I needed to save it to a persistence system, and read it back out, and I needed to make sure it was correct. This means that I have to put operator==() on everything - even the data structures, because I can't be sure I'm not missing a component in the larger picture.

Then in order to test the larger components, I have to really add a clone() method to all the objects in the data structure so that I can be assured that once these elemental objects are equal, placing them in the larger data structure is also equivalent.

Now the code is simple:

  LeafNode & LeafNode::clone()
  {
    return new LeafNode(*this);
  }

because I always have the copy constructor on all my objects (to keep the compiler from creating one for me with behaviors I did not intend). So the code is really just a few lines - but it's this added instrumentation that makes efficient testing possible that makes the process all the more tiresome.

However, It finds bugs. It benchmarks performance. It's useful. It's just not very fun. Painful, in fact, at times.

But it needs to be done.

Stepping into the Flash-Free World

Friday, November 5th, 2010

I was reading this piece by John G on Daring Fireball, and it got me to thinking - it probably was a little more honest to disable Flash completely. I mean, it's still on my box, just not in my Safari or Firefox plugins. And with Google Chrome right on my desktop, I can see Flash if I really need to.

I think I'm going to like it a lot more... I've certainly given it a lot of thought.

Google Chrome dev 9.0.572.0 is Out

Friday, November 5th, 2010

This morning I saw that Google Chrome dev 9.0.572.0 was out - quite a step for the Googlers, but it's only release note is that it includes a new version of Flash. Hmmm... how nice. It's not that I'm against Flash as a concept, it's a language - a bad one, and it's a bad build environment, but it's also that it's just not really all that useful in terms of what can be done today in HTML, JavaScript, CSS and the like. It's just a crutch. Anyway... that's the update: Flash. Yippee.

Creating a National Best Bid/Offer Engine (cont.)

Thursday, November 4th, 2010

GeneralDev.jpg

Well... it took me a little bit, and I had to re-arrange the code slightly, but in the end, I have the logic I need for the NBBO engine that works when you remove the one NBBO component. It's a little more complex, but it's still very clean.

The previous code was slick, and this is nearly as slick, but it's a little more complex because of the different cases I had to deal with in the change of the 'best values'. Still... it's straight-through code with only the one lock to control it's updating, and I think it's going to be plenty fast enough. But we'll have to see how long it takes in real loading conditions.

  1. if ((bidExch != '\0') && (bidPrice > 0.0) && (bidSize > 0)) {
  2. // get the exchange as an index into the arrays
  3. uint8_t idx = bidExch - 'A';
  4. if (bidPrice > myExchData->nbboBidPrice) {
  5. // with a new leading bid, we are all there is to consider
  6. myExchData->nbboBidPrice = bidPrice;
  7. myExchData->nbboBidSize = bidSize;
  8. newNBBO = true;
  9. } else if (bidPrice == myExchData->nbboBidPrice) {
  10. // if we had something in the old NBBO, remove it now
  11. if (myExchData->bidPrice[idx] == myExchData->nbboBidPrice) {
  12. myExchData->nbboBidSize -= myExchData->bidSize[idx];
  13. }
  14. // ...and we know we need to add in our contribution now
  15. myExchData->nbboBidSize += bidSize;
  16. newNBBO = ((myExchData->bidPrice[idx] != bidPrice) ||
  17. (myExchData->bidSize[idx] != bidSize));
  18. } else if (myExchData->bidPrice[idx] == myExchData->nbboBidPrice) {
  19. // we had something in the old NBBO, remove it now
  20. myExchData->nbboBidSize -= myExchData->bidSize[idx];
  21. newNBBO = (myExchData->bidSize[idx] > 0);
  22. // see if this was the only one in the NBBO
  23. if (myExchData->bboBidSize == 0) {
  24. // reset the bid part of the NBBO
  25. newBidPrice = 0.0;
  26. newBidSize = 0;
  27. // save what we have now for the scan
  28. myExchData->bidPrice[idx] = bidPrice;
  29. myExchData->bidSize[idx] = bidSize;
  30. // ...and scan all the data (skipping this guy)
  31. for (uint8_t e = 0; e < 26; ++e) {
  32. if (myExchData->bidPrice[e] > newBidPrice) {
  33. newBidPrice = myExchData->bidPrice[e];
  34. newBidSize = myExchData->bidSize[e];
  35. } else if (myExchData->bidPrice[e] == newBidPrice) {
  36. newBidSize += myExchData->bidSize[e];
  37. }
  38. }
  39. // make sure we have something
  40. if (newBidSize > 0) {
  41. myExchData->nbboBidPrice = newBidPrice;
  42. myExchData->nbboBidSize = newBidSize;
  43. // we KNOW this is a new one
  44. newNBBO = true;
  45. }
  46. }
  47. }
  48. // no matter what, save the bid for the next time
  49. myExchData->bidPrice[idx] = bidPrice;
  50. myExchData->bidSize[idx] = bidSize;
  51. }

There are a few neat things in this guy. First off, we look for the obvious winners - if we have a brand new best bid, or an addition to the size on the existing best bid. We needed to be careful when looking at adding the value in (line 11), and especially careful about sending out a new NBBO message out (line 16).

But the really new part is the scanning of the existing exchange data (line 31). This is done only when there was a single component to the best bid, and he's just taken himself out of the mix. For those cases, we have no choice but to start all over and see what we can come up with. The wrinkle was that we need to add back in our current data so it too was included in the scan. It's possible, after all, that it's still part of the best bid, just ad a different price. And maybe with some friends.

All told, I really liked this code. It's fun, and the test frame for exercising it was a blast to write. I have the ability to throw any number of sequences of ticks to the engine and test each subsequent value. It's nice. Good, fun, work.

Hammering Away at the NBBO Persistence

Wednesday, November 3rd, 2010

GeneralDev.jpg

It's been a hard day. I've been working on my NBBO Engine all day, and the vast majority of it on the persistence of the engine to the Broker's configuration service. It's really a very nice way of doing things - give that I already had the Broker connectivity down. You can simply send variant values to the config service, and it'll store them under a key. Simple stuff - if you can get everything organized properly.

The problem today was that the trie and it's structure. I needed to be able to pack up all the individual components of the trie, but I also needed to be able to pack up the connectivity structure. This means that I needed to be able to put a little more metadata into the packing than I had originally planned, but it all worked out - I think.

I need to get this code to compile - I'm sure I forgot a dozen namespaces here or there, and then I need to build a test frame that will take messages and record the output NBBO messages. It's not very hard, it's just going to take time.

It was a hard day... but I'm glad I pushed through it.

iTerm2 Updated to Alpha 12

Wednesday, November 3rd, 2010

iTerm2

I checked this morning and the iTerm fork called iTerm2 released a new update and it was chock full of a lot of updates and bug fixes:

Alpha 12

Featurepalooza! And bug fixes, too.

New features:

  • Instant Replay: This feature makes iTerm2 behave like TiVo. You can press cmd-opt-b to step back in time and cmd-opt-f to step forward in time. If something is erased from the screen, it's no longer lost forever. This feature can be turned off in Preferences as it does take a bit of extra memory and has a small performance impact (less than 1%).
  • Paste History: When you copy or paste text in iTerm2, it is now saved in the paste history. You can pop up a window at the cursor with cmd-shift-h which shows the last 20 strings in the clipboard. The list is searchable Quicksilver-style.
  • Mouseless Selection: Do you often copy-paste text within the same window? Now you can do it without using the mouse. Open the findbar with cmd-f and search for the text you wish to copy. When part of the text is matched, use the tab key to expand the selection to the right by a full word and shift-tab to expand it left to the previous word boundary. It's automatically copied to the clipboard, or you can use opt-Enter to paste the selection immediately.

Enhancements:

  • Findbar appearance improved.
  • Add undo/redo to edit menu.
  • Add zoom shortcut for cmd-opt-=.
  • Read larger chunks from the network for better performance.
  • Idle CPU usage reduced.
  • Refresh rate adjustment removed, dynamic refresh rate added.
  • Input Method Editor wraps at end of line, scrolls window if necessary.
  • IPV6 support for Bonjour.
  • Bug fixes:

    • Memory usage growth when Bonjour enabled (hopefully, 43).
    • Toggling Bonjour takes effect immediately (89)
    • Make resize look better (98)
    • Restore colors on exit of Vim (99)
    • Make bookmark window resize properly (114)
    • Select name field when copying/creating/editing bookmark (127, 128, 181)
    • Scroll by selection in fullscreen window (133)
    • IME improvements (163)
    • URLs with i18n chars can be selected now (174)
    • Tab lables updated when toggling compact tabs (196)
    • Better multi-monitor support (201)
    • Set localization vars better (204)
    • Fix return to default size (223)
    • Improve i-bar cursor appearance (235)
    • Numeric shortcuts displayed properly (241)
    • Prevent 100% transparency because it doesn't focus (250)
    • Fix bugs with AquaSKK (263)
    • Fix bug where page scrolling sometimes broke.
    • Fix creeping window size when toggling fullscreen.

Quite an impressive list. I'm excited to see how it plays out.

Google Chrome dev 9.0.570.0 is Out

Wednesday, November 3rd, 2010

GoogleChrome.jpg

Seems this morning that the developers at Google released Google Chrome dev 9.0.570.0 and with it a significant version number change. Maybe this bodes well for the browser. Looking at the release notes, however, leaves me wondering:

Mac

  • Make sure the dock icon is updated after closing an incognito window with an in-progress download (Issue 48391)

and then there are several "Known issues" with the build:

Known Issues

  • REGRESSION: Windows media player for Firefox doesn't load - Issue 61603
  • Regression:accelerated compositing slows down the whole machine - Issue 61520
  • google.com/wave : "Page Unresponsive" dailog box appears - Issue 61533
  • myspace.com : Cannot enter a character in Comments field - Issue 61513

which, to me, sounds like the major version number is really just a sham, and the third number - 570, is the one to look at. They aren't using the major numbers to indicate anything other than what they consider to be dev versus beta versus release.

Kind of disappointing...