Archive for the ‘Coding’ Category

Patching ZeroMQ – Pretty Neat (cont.)

Tuesday, December 7th, 2010

ZeroMQ

This morning I finished patching ZeroMQ for the Recovery Interval in Milliseconds option that I was looking to have in order to control the memory usage. I was able to build the code into the shared library, and then by carefully placing it in the project's lib directory, I was able to get the loader to pick it up as opposed to the RPM-installed older version. With this, I was able to verify that the option worked and the savings in memory was, once again, quite substantial. Most excellent!

I then followed the submission guidelines and sent a nice patch to the mailing list. One thing I noted was the Signed-Off-By tag and it's usage in the project. I can see why they have it in git - being generated from the kernel group, but also for a project like this. I really like this kind of stuff - to have the SCM tool understand the needs of it's users and anticipate them.

After I got the one patch submitted, I got a message on IRC from Martin saying I needed to subscribe to the mailing list. Oddly enough, I mentioned to him, that I was subscribed. He then asked me what email? I responded with my general incoming drop box, and then he pointed out that the email was coming from my Comcast account. Ah! Got it.

Rather than use my Comcast account for anything more, I decided to subscribe on my GMail account as it's not going to change, and then send in emails to the list on that same account. It's far easier to use that and then not worry about it than to worry about Comcast going away because of some better ISP in Naperville. I spent a little time trying to get outgoing email at EasyDNS, my DNS service, but that turned out to be a bit dodgey, and realized it's probably better to just stick with GMail and let that be it.

OK... one patch down, one to go - the Java client needs to have the same capabilities to set the Recovery Interval in Milliseconds, and while it used JNI to get to the C library under the covers, it needed to have a few things added to make it really work. Not nearly as difficult as the C library, but still, I followed the same guidelines and submitted a patch for that - now on my GMail address.

I saw a few minutes later that my patch was in the mailing list digest, which confirms that I got everything right with the GMail switch-over... nice. We're running with the changes on our boxes, but if they happen to make changes in the near future we'll just have to deal with it as that's the cost of being a submitter.

Neat stuff. Fun.

[3:30pm] UPDATE: I have to say... just when you think you have it all under control, life pops up and shows you you aren't as put-together as you think. Case in point: today I thought I had a good set of patches to ZeroMQ and it's Java client, but then I tried to receive the data and it was all a mess. I then had to send something to the mailing list that said "I'm a dufus", and then back out all the changes and verify that my code worked.

It didn't, and so I spent a good 30 mins finding out why not - only to see that one of the options on ZMQ was really not all that well documented - the Ignore Loopback option. They didn't mean the loopback interface, they meant any other client on the same box. So I had to drop that option and then things started working.

Well, at that point, it made sense to try and see if I could get the fixes in ZMQ back in and working. It didn't take long, and sure enough, it's working. So I sent in another patch with a few additional changes, and that should be good to go.

Nothing like being shown to be a dufus in front of people you'd like to impress. Yeah... class act all the way.

Google Chrome dev 9.0.597.10 is Out

Tuesday, December 7th, 2010

This morning I noticed that Google Chrome dev 9.0.597.10 was out, and the release notes indicate that it's just stability fixes and a few UI tweaks. Fair enough... not every release can be amazing, and it's getting into it's final form. There hasn't been a big change in the UI in quite a while, and the improvements in the V8 engine or other infrastructure components are going to be constantly ongoing.

Still... it's nice to see progress.

Patching ZeroMQ – Pretty Neat

Monday, December 6th, 2010

ZeroMQ

This morning I was chatting with Martin S. on the ZeroMQ IRC channel and there was a suggestion of how to handle the "socket recovery interval in msec" option in the code. He pointed out that I'd need to change the ZeroMQ code, and why didn't I do that and then send the patch to the mailing list and he'd incorporate it.

Sweet! A request for a (simple) patch to the codebase by the primary maintainer. I like this stuff. It's not hard, but there are a few wrinkles, and the coding standards are at least in existance, which is a huge help to the project. I just need to get a few things figured out, write the code, compile it all up, and then make the diff for the mailing list.

I'm sure there's going to be a lot of little details I learn as I do this, but it's nice to get a chance to contribute to another nice open source project.

UPDATE: I've pretty much got it all done, but the hint I received from the guy that really knows OpenPGM in the group is a little sketchy. He gave me an equation which includes the size of the transport package:

Easy workaround is to calculate the buffer size in sequence numbers in 0MQ and pass that onto OpenPGM. Then you can export socket options for 0MQ to set the buffer size in seconds, milliseconds, etc.

int sqns = (secs * max_rte) / tpdu_size;
pgm_setsockopt (sock, IPPROTO_PGM, PGM_TXW_SQNS, &sqns, sizeof (sqns));

I think I found what should go in that spot, but I wasn't 100% sure. So I replied to the guy on the mailing list and now I'm waiting for confirmation/correction from him. It shouldn't take too much longer to finish this up, and then I'll have a way to set the ZMQ_RECOVERY_IVL_MSEC - which, with a non-zero value, will override the ZMQ_RECOVERY_IVL value and use the value in milliseconds. Should be pretty easy to finish.

Tracking Down Nasty Memory Issue – Patience is a Virtue (cont.)

Friday, December 3rd, 2010

Detective.jpg

This morning has been very enlightening on ZeroMQ. Very exciting stuff. As I was leaving yesterday I had made a test app for the ZeroMQ guys to check and then posted the following test results as I varied the value of ZMQ_RATE:

bps ZMQ_RATE Initial Final
10 Mbps 10000 7 MB 18 MB
50 Mbps 50000 7 MB 73 MB
200 Mbps 200000 7 MB 280 MB

The data was pretty compelling. The effect ZMQ_RATE had on the memory footprint of the same data source was staggering. Thankfully, I put it all together in a nice email to the mailing list and I got a great hit from Martin S.:

Isn't it just the TX buffer? The size of PGM's TX buffer can be be computed as ZMQ_RATE * ZMQ_RECOVERY_IVL. The messages are held in memory even after they are sent to allow retransmission (repair) for the period of ZMQ_RECOVERY_IVL seconds.

So I added the following to the ZMQ transmitter's code:

  static int64_t     __rate = 50000;
  static int64_t     __recovery = 1;
  static int64_t     __loopback = 0;
 
  // we need to set this guy up properly
  top->socket->setsockopt(ZMQ_RATE, &__rate, sizeof(__rate));
  top->socket->setsockopt(ZMQ_RECOVERY_IVL, &__recovery, sizeof(__recovery));
  top->socket->setsockopt(ZMQ_MCAST_LOOP, &__loopback, sizeof(__loopback));

And then started running the tests again.

The results were amazing:

bps ZMQ_RATE Initial Final
50 Mbps 50000 7 MB 11 MB
200 Mbps 200000 7 MB 32 MB

This was exactly what I was looking for! The ZMQ_RECOVER_IVL can't go below 1 sec, but for me even that's too much. If you're not here and ready to get ticks, then waiting a second is likely to be several hundred if not several thousand messages. It'd be fine with me to make it 0.5 sec - but Martin says that's the underlying resolution of OpenPGM.

Not bad. I'll take it. What a great morning!

[12/7] UPDATE: the option:

  static int64_t     __loopback = 0;
 
  top->socket->setsockopt(ZMQ_MCAST_LOOP, &__loopback, sizeof(__loopback));

is a massive red herring. It's not about the loopback interface, as my reliable multicast URLs are all targeted to specific NICs, it's more about being able to receive on the same box as the sender. I was trying to figure out why things "broke", and it's when I took this out that things worked again. Dangerously worded docs on this one... leave it out.

Tracking Down Nasty Memory Issue – Patience is a Virtue

Thursday, December 2nd, 2010

Detective.jpg

I've been trying to track down what I believed to be a nasty memory leak in my code today. The short-cut to the answer is that it wasn't a leak, and it wasn't in my code. But I'm getting ahead of myself.

The problem was manifesting itself as steadily growing memory on some of my ticker plants. In truth, it was probably all of them, but it wasn't effecting all of them equally. I have spent a lot of time on this over the past weeks, and today I was going to get to the bottom of this for sure.

So I started digging into the problem by shutting things off. What I found was that if I was listening to anything on the UDP socket and doing anything with it I was getting about an 8-byte increase every two seconds. Very odd. I had turned off ZeroMQ at the time, so the messages were just getting dropped in the trash, but they were being processed completely up to that point.

I was trying everything, and then I had to run to a meeting. I left the test running because I needed to hurry. It wasn't going to consume the box in half an hour, anyway.

When I came back I noticed that the memory had stabilized!

Now it was getting interesting. Very interesting. I started tracking things down and it turns out that the ZMQ_RATE parameter was a major factor in the terminal memory value. I then wrote up a simple test - something that I knew the ZeroMQ guys would appreciate, and started running it.

Again - major dependency on the value of ZMQ_RATE. I'll have to do more work on this tomorrow.

Google Chrome dev 9.0.597.0 is Out

Thursday, December 2nd, 2010

GoogleChrome.jpg

After quite a silence, Google Chrome dev 9.0.597.0 is out and there are some really nice fixes in this release:

All

  • Ongoing work on IndexDB and GPU
  • Tweaks/Fixes to Google Chrome Instant
  • Extensions/Apps work
  • Autofill related fixes

Known Issues

  • Page becomes unresponsive when trying to play video - Issue 65772
  • Certain HTML5 sites fail to load due to a compositor issue - Issue 64722

I like the GPU updates and the video updates, but I can pass on the "instant"... icky addition in my book, but their app, their choice.

Fantastic Speed Boost on My uint128_t

Wednesday, December 1st, 2010

Professor.jpg

Late yesterday I realized that I had some lingering code that was using the uint128_t I had created because I needed to uniquely map the instrument names into some kind of number space for including in the likes of std::map. The code I had originally written worked, but it wasn't nearly fast enough, and so I stopped using it (so I thought), and switched to the trie.

But it wasn't really gone. I had a lingering use for it in my client code, and I decided this morning to fix up the implementation so that I had something that was a lot faster - hopefully in the same ballpark as a uint64_t for map usage.

The first thing I did was to add a timed test section to my testing code for the conflation key - that's what I called the 128-bit value generated from the name of the instrument. It was pretty simple:

  int         cnt = 100000;
  log.info("starting the uint64_t tests...");
  boost::unordered_map<uint64_t, int> little;
  uint64_t    startTime = msg::TransferStats::usecSinceEpoch();
  // ...first the puts
  for (int i = 0; i < cnt; ++i) {
    little[i] = i;
  }
  // ...now the gets
  for (int i = 0; i < cnt; ++i) {
    if (little[i] != i) {
      error = true;
      log.error("uint64_t test failed for i=%ld", i);
    }
  }
  uint64_t    totalTime = msg::TransferStats::usecSinceEpoch() - startTime;
  log.info("%d uint64_t tests completed in %ld usec", cnt, totalTime);
 
  log.info("starting the uint128_t tests...");
  boost::unordered_map<uint128_t, int> big;
  startTime = msg::TransferStats::usecSinceEpoch();
  // ...first the puts
  for (int i = 0; i < cnt; ++i) {
    big[i] = i;
  }
  // ...now the gets
  for (int i = 0; i < cnt; ++i) {
    if (big[i] != i) {
      error = true;
      log.error("uint128_t test failed for i=%ld", i);
    }
  }
  totalTime = msg::TransferStats::usecSinceEpoch() - startTime;
  log.info("%d uint128_t tests completed in %ld usec", cnt, totalTime);  

What I saw in my initial tests was horrible. It wasn't even close. I had more than a factor of 300x difference between the two. When I looked at the way I'd implemented the uint128_t it made a lot of sense:

  private:
    uint8_t   mBytes[16];

I had 16 individual bytes as the data ivar for the object. Makes a lot of sense as it never suffers from the host/network byte ordering issues, and things looked fast in the code - but there were loops and a lot of calls to memcpy(). So I needed to take a new approach, and I decided to go to the other extreme - two uint64_t values as opposed to sixteen uint8_t values.

This changed a lot of the code. For one, it made a lot of sense to write my own hton() and ntoh() functions for the classes so they'd look like the system calls ntohl() and the like. It really wasn't all that hard, either:

  uint128_t hton( const uint128_t & aValue )
  {
    uint128_t     retval;
 
    // get the byte pointers to the source and destination
    uint8_t *dest = (uint8_t *)retval;
    uint8_t *src = (uint8_t *)aValue;
    // now map the bytes one-by-one from source to destination
    dest[0]  = src[7];
    dest[1]  = src[6];
    dest[2]  = src[5];
    dest[3]  = src[4];
    dest[4]  = src[3];
    dest[5]  = src[2];
    dest[6]  = src[1];
    dest[7]  = src[0];
    dest[8]  = src[15];
    dest[9]  = src[14];
    dest[10] = src[13];
    dest[11] = src[12];
    dest[12] = src[11];
    dest[13] = src[10];
    dest[14] = src[9];
    dest[15] = src[8];
 
    return retval;
  }
 
 
  uint128_t ntoh( const uint128_t & aValue )
  {
    uint128_t     retval;
 
    // get the byte pointers to the source and destination
    uint8_t *dest = (uint8_t *)retval;
    uint8_t *src = (uint8_t *)aValue;
    // now map the bytes one-by-one from source to destination
    dest[7]  = src[0];
    dest[6]  = src[1];
    dest[5]  = src[2];
    dest[4]  = src[3];
    dest[3]  = src[4];
    dest[2]  = src[5];
    dest[1]  = src[6];
    dest[0]  = src[7];
    dest[15] = src[8];
    dest[14] = src[9];
    dest[13] = src[10];
    dest[12] = src[11];
    dest[11] = src[12];
    dest[10] = src[13];
    dest[9]  = src[14];
    dest[8]  = src[15];
 
    return retval;
  }

The old scheme allowed me to use memcpy() to put the data into a data stream - and to take it out. But now with a real "host byte order", I needed to add methods on the uuid_t class to pack and unpack it's data from the data streams. Not bad, and it made the code look a lot cleaner, but I had that crud scattered in the code in a ton of places.

Bad form on my part - really.

I even had to create the prefix/postfix increment and decrement operators to make sure it could function in the loops I might have. I really wanted this to be complete. Thankfully, the code to do this didn't turn out to be that hard. In fact, I was able to do it all in a lot fewer lines of code because I could use the compiler to do a lot of the up-casting work that I had to do with memcpy() before. Nice benefit.

The upshot of all these changes is that the new uint128_t was only 30% slower than the uint64_t! That's amazing compared to where it started. It's not going to set any speed records, but given that it's not a built-in CPU data type, it's pretty good. Certainly good enough for all the things I need it to do.

Fantastic work!

Finding Subtle Bugs Takes Time

Tuesday, November 30th, 2010

bug.gif

I've been lucky today - I finished a good chunk of code this morning and now I'm able to watch my ticker plants run. It's a funny thing, finding the subtle bugs takes time. You have to watch the code run, ask questions, monitor log files - all these things take time, and it's that phase of polishing an app that's very rewarding.

If you have a distributed app you need to monitor the load and rebalance things as needed. That takes more time.

I've been luck this morning to be able to take this time and really study what's happening. I've found a few bugs, and those would not have been easy if I were in a hurry as they weren't serious crashing issues. Still, they needed to be fixed.

It's nice to have a little time to watch and monitor the work you've done. It's really very rewarding.

ZeroMQ is Nearing Release of 2.1

Tuesday, November 30th, 2010

ZeroMQ

I've found a singular problem with ZeroMQ, and noted in the IRC chat conversations that this should be fixed in the soon-to-be-released 2.1. It's a simple memory leak with the sending of messages. My code is pretty simple: I get the payload for the message, I get the ZMQ socket it needs to be sent on, and then I simply make a ZMQ message and send it. That's about as simple as you can get:

  1. if (aTopic.first() != NULL) {
  2. try {
  3. // lock up this socket while we send the data out...
  4. boost::detail::spinlock::scoped_lock lock(aTopic.second());
  5. // make a ZMQ message of the right size
  6. zmq::message_t msg(aPayload.size());
  7. // ...copy in the data we need from the payload
  8. memcpy(msg.data(), aPayload.data(), aPayload.size());
  9. // ...and WHOOSH! out it goes
  10. aTopic.first()->send(msg);
  11. } catch (std::exception & e) {
  12. error = true;
  13. cLog.error("[sendToZMQ] trying to send the data got an "
  14. "exception: %s", e.what());
  15. }
  16. }

If I comment out line 10 - the send(), the memory doesn't grow any faster than I might expect based on the cached messages. But leave it in and the memory grows and grows. More interestingly, it's different for the different kinds of payloads I send. Very odd.

Anyway... the ZeroMQ guys said they planned on having a release last week, but it seems things happened, and that's OK with me - this is important. I need to keep the memory under control.

CKit’s IRC Protocol Implemented in Boost ASIO

Monday, November 29th, 2010

Boost C++ Libraries

For the last day and a half I've been working on re-writing my C++ IRC Client code to use boost's asio, and it's been pretty interesting. I will say there are a lot of plusses to using the boost socket functionality - even over my own socket library (imagine that!).

First, it's got the complete asynchronous mechanism for sending and receiving - you just have to love that. Also, they have done a wonderful job in making it all very rational and sane. Asynchronous methods perform as you'd expect, and have remarkably similar signatures to their synchronous counterparts. It makes writing with the classes very simple. Clearly, a lot of thought has gone into this stuff, and that's really nice to see.

Secondly, there's not the need for all the threads that I had in my old code. Primarily because of the single io_service thread that boost asio uses for all async operations. This really is a great timesaver. You can easily have multiple threads sending out chats to the IRC server with the async writer. Very slick.

Finally, the resulting code size is much smaller. That's more of a consequence of the other two, but the payoff can't be understated. Less code means less maintenance, less cost, and less time. You just can't beat that.

So I have it all done, but I haven't been able to test it yet as we don't have an IRC server up and running - yet. It's been discussed, and maybe today they'll see that I've gotten code ready, and they'll put one up. It's not hard - just takes a little time, that's all. But the benefits will be enormous. I look forward to getting the server going, testing my code, and integrating this into my TickerPlants and other libraries. It's just an amazingly powerful tool for support and problem solving.