Archive for July, 2010

Boost Asio Buffering and STL Containers – Very Odd to Pre-Fill

Friday, July 16th, 2010

Boost C++ Libraries

I was doing work today with boost asio and the buffering of async socket I/O and was stunned at the realization that the size() was being used in the buffering of data in a std::string or a std::vector<char>. But let me explain.

The standard call to async_receive_from() - or, in fact, any boost async reader method, is a boost::asio::buffer. This is a standard buffer class for boost that takes an underlying data container - like a char [] or a std::string oe a std::vector<char> and then populates that underlying data container with the data coming from the source - be that UDP or TCP, and it works very nicely.

But what I expected was to do something like the following:

  std::vector<char>   buff;
  buff.reserve(16384);
  ...
  mSocket.async_receive_from(boost::asio::buffer(buff),
      boost::bind(...);

such that I had created (allocated) 16kB of space in the std::vector<char> and then when I got the callback, I'd look at size() and see what was really in there. But that's not how it works.

If you do that, then nothing will be read in to the buffer because boost uses the size() to know how big the buffer is. This means that you really have to "pre-fill" the container with junk, and then let the boost asio methods overwrite the junk with good data. It's then mandatory that you get the number of bytes read from boost, as there's no way for you to know what's in the container that's from the I/O and what was put there by you prior to the call.

The code looks deceptively simple:

  std::vector<char>   buff(16384);
  ...
  mSocket.async_receive_from(boost::asio::buffer(buff),
      boost::bind(...);

and in many cases, the value in the constructor might appear to be a capacity. But it's really a size of junk data. Very odd.

Given that boost chose to use size() versus capacity(), I have to wonder why. It makes no sense to me at all. But hey... it's not the first, and it sure won't be the last.

Working on UDP Multicast in Boost

Thursday, July 15th, 2010

Boost C++ Libraries

Today I started with the UDP multicast receiver in my application framework. It's new ground for me as I've never done raw multicast at the socket level, and I have to agree that boost asio makes it a lot easier. Understanding the io_service and it's role in the workings of the async reading is really helpful. The things I learned getting the tcp connection working have really paid off.

I'm glad that the Team has an existing UDP multicast sending test frame. I was able to get this guy working in less than a day and show that it's handling every thing the "re-player" test harness can send. Very nice.

Tomorrow - the UDP Transmitter.

Getting Pretty Clever with Serialization and Google Varints

Wednesday, July 14th, 2010

google-labs-logo.gif

Today I started with a working, but somewhat inefficient serialization scheme and a desire to do a lot better. Late yesterday, I talked with my sensei at The Shop about the improvements I'd made, and we both agreed that the best thing to do was to look at how Google protobufs packs it's data into a byte stream. It's certainly very fast, and efficient, as Google uses it to move massive amounts of data around it's enterprise, and that's what I was looking for. Thankfully, I knew that all I needed to do was to write a few helper methods and the same serialization scheme I'd been using the the in-house serialization scheme would work just fine.

So what is Google protobufs doing, really? Well, they are 'packing' 32-bit and 64-bit unsigned integers into as small a space as they can fit - and still retain a very fast deserialization speed. How do they do it? They encode the integer into 7-bit sections and use the high-order-bit to indicate if this is the last byte in the number. So as long as the serialized byte has the high-order-bit set, it's not the last. Google calls these varints - variable integers.

This doesn't sound very fast, but when you look at the implementation, it's really pretty slick. This is the 32-bit unsigned integer serialization scheme:

  bool Message::packVarint( std::string & aBuffer, uint32_t aValue )
  {
    char      buff[5];
    uint16_t  size;
 
    buff[0] = static_cast<uint8_t>(aValue | 0x80);
    if (aValue >= (1 << 7)) {
      buff[1] = static_cast<uint8_t>((value >>  7) | 0x80);
      if (aValue >= (1 << 14)) {
        buff[2] = static_cast<uint8>((value >> 14) | 0x80);
        if (aValue >= (1 << 21)) {
          buff[3] = static_cast<uint8>((value >> 21) | 0x80);
          if (aValue >= (1 << 28)) {
            buff[4] = static_cast<uint8_t>(value >> 28);
            size = 5;
          } else {
            buff[3] &= 0x7F;
            size = 4;
          }
        } else {
          buff[2] &= 0x7F;
          size = 3;
        }
      } else {
        buff[1] &= 0x7F;
        size = 2;
      }
    } else {
      buff[0] &= 0x7F;
      size = 1;
    }
 
    // now just append this data to the buffer
    aBuffer.append(&buff[0], size);
    return true;
  }

The beauty of this scheme is that it only does the work it needs to. Sure, it's a mess of nested if statements, but so what? It's not like you need this to be exceptionally readable, and it's not that bad. It's there when you need it, and it's going to run through the data like a hot knife through butter. Amazingly simple and clean.

And to deserialize it is just a fast:

  bool Message::unpackVarint( std::string & aBuffer,
                              uint16_t & aPos,
                              uint32_t & aValue ) const
  {
    /*
     * I'm going to go with Google's code here, for the most part, as they
     * have gone through the difficult part of optimization. I'm not a huge
     * fan of the 'goto's, but again, it's about speed here, and that's
     * more important than the use (or not) of a particular language struct.
     */
    const char  *ptr = & aBuffer.data()[aPos];
    uint32_t    b;
    uint32_t    result;
 
    b = *(ptr++); result  = (b & 0x7F)      ; if (!(b & 0x80)) { aPos += 1; goto done; }
    b = *(ptr++); result |= (b & 0x7F) <<  7; if (!(b & 0x80)) { aPos += 2; goto done; }
    b = *(ptr++); result |= (b & 0x7F) << 14; if (!(b & 0x80)) { aPos += 3; goto done; }
    b = *(ptr++); result |= (b & 0x7F) << 21; if (!(b & 0x80)) { aPos += 4; goto done; }
    b = *(ptr++); result |=  b         << 28; if (!(b & 0x80)) { aPos += 5; goto done; }
 
    /*
     * If we are here, then that means that the data is too big to fit into
     * a 32-bit int. Sadly, we need to keep reading until we find the end of
     * the varint, and update aPos based on the end of the actual data. The
     * value returned to the caller will simply be truncated.
     */
    for (int i = 0; i < 5; ++i) {
      b = *(ptr++); if (!(b & 0x80)) { aPos += (6 + i); goto done; }
    }
    // if we're here, then we overflowed the 64-bit varint, and it's bad.
    return false;
 
    // at this point, we're all done, save the value and leave
    done:
    aValue = result;
    return true;
  }

in the code, the main section of that method is really one line per byte, and it's pretty easy to read.

Again, the code only does what's needed, and it's pretty fast. This is some really good stuff from Google. I'm impressed. I'm also a little surprised that they made such heavy use of gotos in the code. Normally, that's a "coding no-no', but clearly, they have effectively used the old-style 'jump' quite effectively here.

They also created 64-bit versions of these methods, but here they didn't make the choice I would have made: they cleverly realized that on a 32-bit machine, it was faster to break up the 64-bit unsigned int into three 32-bit blocks (remember the single bit per byte for continuation), and then process those. However, for most modern linux systems, it's 64-bit, so they could have made a simple #ifdef on the word size and then use the "chunking" code when it's 32-bit, and expand the above methods to work on 64-bit numbers when not.

There's a significant advantage here because of this chunking - the code above does only what's necessary, and does it fast. Expanded to 64-bits it's no slower until you get to a number that can't be stored in 32 bits. At that point, the "extra" code is used, and it slows down a bit. But in Google's "chunking" they realize that to do that, they'd have to have additional compares to see how large a number is. Because of that, they did a simple, but effective, hardcoded binary tree to get the general size of the number. While this yields the best performance on a 32-bit machine, it's going to be far from the best on a 64-bit system.

Therefore, in my code, I put in the appropriate #ifdefs so that the right version is used for the wordsize of the generated code, and so we'll always be getting the best performance we can get. I'm very excited about this. And a little surprised that they didn't include this in their code, but no matter, I've got it and that's great.

When it comes to signed integers, they created this "ZigZag Scheme" where the signed integers are coded into unsigned integers by ranking them by their absolute value - negative values winning ties, and then using the same varint packing scheme on them as on the unsigned ones. It's pretty slick - and very fast. One line of code does it. Pretty amazing stuff.

With a few helper methods, I was able to change the serialization scheme pretty easily and then run my tests on the same data. What I saw was that a 29-byte packet of two strings went to 26 bytes because the Strings were encoded with these varints for the length, and that's a 20% savings for just two strings! Not bad. When I start to add in real data, it's going to get even more impressive.

All in all, I have to say that once again, today was a good day. Pretty neat stuff.

Big Day With Serialization – Boost and My Own

Tuesday, July 13th, 2010

Boost C++ Libraries

Today was a Big Day for my work with boost serialization, and I'm very glad to have stuck it out. Early this morning I was able to verify that if I serialized a std::list<msg::Message *> I was able to correctly get it across the wire and reconstitute it into the proper subclass on the other end. That was a huge relief. But I didn't stay happy for long. The size was pretty dismal.

I was sending 19 bytes of string data in two strings in the object, and the successfully serialized data was 109 bytes. That's a lot of overhead. I'm sure it's not really horrible as you scale up, and if I wasn't in an environment where size was everything, I'd have been happy with the results and gone on to the next problem. But size is critical here, and I had to do something about it.

So what to do?

I liked the way boost allowed for the versioning of the serialization. I liked the fact that we have a budding new serialization scheme from another group, and so I decided it was time to write my own, and this time, not include the need for a container class in order to properly serialize the messages.

So I gutted the boost serialization code and decided that I'd have each class implement a serialize() and deserialize() method and then also have a constructor that takes the raw data and "creates" an instance from it. If I then created enough helper functions for the different ivar types, I'd be able to make the code look pretty clean and we'd have a solid scheme in place.

The one additional component was that I'd have to create a MessageFactory that had a singleton that created the correct Message instance based on the type that was provided in a transmission header. OK, there's another thing I needed to add - a more complete message transmission header.

So here's what I ended up doing today:

First, I created a fixed-length message header to act as a preamble for all the message packets:

  namespace msg
  {
    struct Preamble
    {
      /*
       * This is meant to be a simple read/write struct - with methods on
       * those occasions that you'd like to have a little more "smarts"
       * than the simple struct.
       */
      uint16_t    messageSize;
      uint8_t     messageType;
      uint8_t     messageVersion;
 
      ...
    }
  }  // end of namespace msg

At this point, I created convenience constructors that took a Message and it's serialized data stream and filled in the values. Pretty simple, but then I could use that and replace my simple 'size' header in the boost asio send and receive methods with one of these guys, and the fixed size of one of these guys.

Now I had a header that could grow with me and have all the metadata I was going to need for my serialization scheme, but I still want to keep it small. The key here is that the 'type' of a message is a single 8-bit unsigned integer and that's more than enough for the work we're doing. I'll end up using this, along with the single-byte version number, and the binary data to reconstitute a new instance on the other side.

I next needed to create the serialize() method on the objects to put the data in the binary stream. Here's the base class' version of that method that starts the ball rolling:

  std::string Message::serialize() const
  {
    std::string	code;
    code.reserve(16384);
    return code;
  }

this makes sure that the empty std::string that's returned is at least big enough to hold almost every message I can generate. This is going to cut down on the allocations in the generation - which is really good for the processing speed. Still... if we have to go past the 16kB starting size, it's nice to know that the std::string will automatically grow as we append() data.

Now a derived class:

  std::string GoodBye::serialize() const
  {
    // get the super's serialization contents first
    std::string	code = Message::serialize();
    // ...and then add in mine
    pack(code, mReason);
    pack(code, mExplanation);
    return code;
  }

where the ivars are simple types (uint16_t, std::string, etc.) and I've created a series of pack() methods that append the binary format of the data to the std::string. These pack() methods aren't too hard - they are basically taking the data and casting it to a (const char *) and letting the std::string append() method do it's thing and add it to the end of it's data. So... things are looking pretty good so far. All I need now is a deserialize(), a constructor, and a factory.

  bool GoodBye::deserialize( const uint8_t aVersion,
                             const std::string & aCode,
                             uint16_t & aPos )
  {
    bool     error = false;
    /*
     * First, deserialize the super class, and then unpack all the
     * ivars I packed up in serialize() in the same order. If any
     * one fails, then the deserialization fails. All have to
     * succeed.
     */
    if (!Message::deserialize(aVersion, aCode, aPos) ||
        !unpack(aCode, aPos, mReason) ||
        !unpack(aCode, aPos, mExplanaztion)) {
      error = true;
    }
    return !error;
  }

what's happening here is that we're deserializing the super class and then getting to my ivars in the same order as the serialize() method. With the inclusion of the aVersion version variable, I'm able to include or exclude anything I need, and additions to the data buffer aren't going to mess me up either.

This gives me everything I wanted from boost serialization as far as versioning. I could then make a very simple constructor:

  GoodBye::GoodBye( const uint8_t aVersion,
                    const std::string & aCode ) :
    msg::Message(),
    mReason(),
    mExplanaztion()
  {
    uint16_t   pos = 0;
    if (!deserialize(aVersion, aCode, pos)) {
      cLog.warn("<constructor:deserialize> I was unable to deserialize "
                "the version %d code of %d bytes!", aVersion, aCode.size());
    }
  }

and with this, I can create them based on their type code - contained in the preamble. All that remained was to tackle the factory.

Thankfully, I'd expected that I'd need something like this, and all I needed to do was to create the methods that took a type code and returned a Message name, as well as a method that took a preamble and a data stream and created a message. The code for the latter was a lot like the former and looks like this:

  msg::Message *MessageFactory::deserialize( const msg::Preamble & aHeader,
                                             const std::string & aCode );
  {
    msg::Message     *msg = NULL;
    /*
     * This is the unfortunate part of the Factory - we need to just
     * run through all the message types, and use them to create the
     * appropriate Messages. It's possible to make a registration-based
     * system, but that can wait for a later version.
     */
    switch (aHeader.messageType) {
      case 0:  msg = new msg::Message(aHeader.messageVersion, aCode);  break;
      case 1:  msg = new Hello(aHeader.messageVersion, aCode);         break;
      case 2:  msg = new GoodBye(aHeader.messageVersion, aCode);       break;
      default:
        cLog.warn("[deserialize] the message type %d is unknown - trouble!",
                  aHeader.messageType);
        break;
    }
    return msg;
  }

Once I had all this together, I was able to build it and run it with my test tcpServer and tcpClient that were sending Hello messages to one another. I had one little problem in the decoding - a typo, and other than that, it all worked! Amazing.

The reduction in payload sizes was also dramatic - from 109 bytes to 29 bytes - again, on 19 bytes in two strings. The big problem to reducing this further was the serialization scheme that I was using:

Data Type Format
any integer 'L' + 8-byte long value
any float 'D' + 8-byte double value
String 'S' + 4-byte length code + n-byte char data

so it didn't matter that I was being efficient in my classes with the proper sized integers and std::string, I was getting a built-in expansion in the encoding.

Still, this was huge for me. I had finally mastered the boost serialization issues and then gone ahead and written my own that was just as effective, but faster and much smaller. What a great day!

Finally Making Headway with Boost Serialization and Asio

Monday, July 12th, 2010

Boost C++ Libraries

FOr the first time today, I'm starting to feel like I am getting a handle on the boost Asio and Serialization. I know I need to write all this up in a few posts, and I will, but I ran out of time today and had to run and catch the train. But progress was made.

I was able to properly get the async I/O working. You have to create a thread to call boost::io_service.run() - there's just no two ways about it. While I thought boost would be doing this for me, it isn't. OK, lesson learned. But I was able to take the TCPClient and TCPTransceiver classes and add in the boost threading without a lot of trouble. On the TCPTransceiver side, it's easy to start, and on the TCPClient side, you only need to start the thread if you want to do async I/O. Very easy to leave off, and it starts automatically if you register for async I/O messages.

Then I tackled the disconnect of either end of the socket. That wasn't too hard, and that made things a lot more professional. Looking better.

Finally, I tried to get the serialization working, but I'm convinced that the only way to do it is to put the single message into an stl::list<msg::Message *> because that's what's used over and over in the examples as a way to use the polymorphic serialization and de-serialization. So I've got to try that tomorrow morning.

But in the end, I'm looking at the boost serialization and thinking that I need to rip it out anyway as it's not cross-platform. If I do that, then I can use the MessageFactory concept and create a MessagePreamble like:

  struct MessagePreamble {
    uint16_t      messageBodySize;
    uint8_t       messageType;
    uint8_t       messageVersion;
  };

and that 32-byte preamble could be sent in front of the serialized message, and then the number of subsequent bytes read, and the messageType used to call the proper constructor with the signature:

  Message *newGuy = new Hello(preamble, data);

and each message handles the "take values from serialization stream" with a version and such just like the boost serialization. It's got all the features I need, and it's cross-platform. I think it's what I have to do once I get the boost serialization figured out.

Hope tomorrow is a good day!

Google Chrome dev 6.0.458.1 is Out

Monday, July 12th, 2010

This morning I noticed that Google Chrome dev was updated to 6.0.458.1 and the release notes are once again, a little sketchy. It's nice that it seems to be slowing down - maybe that means the 6.x release line is nearing completion. Then again, it could just mean work is slowing down.

In any case, it's good to keep up to date on something that is changing so quickly.

Growl 1.2.1 is Out

Monday, July 12th, 2010

growlicon.png

It's not updated often, and I suppose that's a good thing, but this morning I noticed that Growl 1.2.1 got updated this past weekend with a rather impressive list of changes. The Growl team seems to be a little miffed that several applications (DropBox, Adobe Creative Suite) are installing Growl as a part of their installations, and it's a little sneaky and underhanded in their book. I can understand their concerns - to a point. People see something in their systems that they didn't install, and it's not clear to them that it's because of something they did install. It can be annoying support emails. Not good.

But if someone wanted to include that functionality, and it's explicitly made open source, then it's really a shame that you can't "tag" it as part of a larger install package. Like "Growl for DrpoBox" in the name. Then, at least, it'll give the users a good idea what's going on and why that might be installed on their systems.

Anyway, I like it, and it's nice to see the improvements.

Struggling with Boost Serialization and Asio Libraries

Friday, July 9th, 2010

Boost C++ Libraries

I know I've been at this for days, and I'm making some progress, but it's tortuously slow, and several times today I wondered if this was even the right path. I could pull up my old CKit stuff and be productive in minutes. The problem is, I need async I/O. There's no two ways about it. If I cheat here and skip async I/O on these TCP sockets, I'll regret it forever.

So I have to slug through this no matter how bad it seems.

I just hope it gets better soon... I'm getting discouraged.

Pushing Ahead with Boost

Thursday, July 8th, 2010

Boost C++ Libraries

Today has been another very hard day with boost. It's not an easy set of libraries to get a handle on because it wasn't made for ease of use. It was made, it seems, for primarily STL expansion, so most things are just header files, but the stuff I need - Serialization and ASIO, are libraries. I can see that it appears capable, but getting something beyond the examples is non-trivial.

Certainly, a few examples where they don't put everything into one file would be nice. I've had problems understanding what should go in what classes and headers, simply because they made extremely compact examples. Certainly something to be proud of, but then there's the explanation that needs to go along with it.

Yup, someone could make a fortune with a really good book on boost.

Slugging Through Boost Serialization and Asio Libraries

Wednesday, July 7th, 2010

Today was spent trying to get a handle on the boost serialization and asio libraries to get some handle on the moving of messages over the tcp sockets that this project is going to use. If I had seen the "Serialization" example on the Asio Examples page, I'd have saved myself a ton of time, but I didn't until much later.

Ugh. What a horrible afternoon. I was trying to understand what they were doing with very limited examples and the docs on the classes aren't all that good. It's a mess.

Well... I think I have a lot better idea now.