Archive for the ‘Coding’ Category

Getting Pretty Clever with Serialization and Google Varints

Wednesday, July 14th, 2010

google-labs-logo.gif

Today I started with a working, but somewhat inefficient serialization scheme and a desire to do a lot better. Late yesterday, I talked with my sensei at The Shop about the improvements I'd made, and we both agreed that the best thing to do was to look at how Google protobufs packs it's data into a byte stream. It's certainly very fast, and efficient, as Google uses it to move massive amounts of data around it's enterprise, and that's what I was looking for. Thankfully, I knew that all I needed to do was to write a few helper methods and the same serialization scheme I'd been using the the in-house serialization scheme would work just fine.

So what is Google protobufs doing, really? Well, they are 'packing' 32-bit and 64-bit unsigned integers into as small a space as they can fit - and still retain a very fast deserialization speed. How do they do it? They encode the integer into 7-bit sections and use the high-order-bit to indicate if this is the last byte in the number. So as long as the serialized byte has the high-order-bit set, it's not the last. Google calls these varints - variable integers.

This doesn't sound very fast, but when you look at the implementation, it's really pretty slick. This is the 32-bit unsigned integer serialization scheme:

  bool Message::packVarint( std::string & aBuffer, uint32_t aValue )
  {
    char      buff[5];
    uint16_t  size;
 
    buff[0] = static_cast<uint8_t>(aValue | 0x80);
    if (aValue >= (1 << 7)) {
      buff[1] = static_cast<uint8_t>((value >>  7) | 0x80);
      if (aValue >= (1 << 14)) {
        buff[2] = static_cast<uint8>((value >> 14) | 0x80);
        if (aValue >= (1 << 21)) {
          buff[3] = static_cast<uint8>((value >> 21) | 0x80);
          if (aValue >= (1 << 28)) {
            buff[4] = static_cast<uint8_t>(value >> 28);
            size = 5;
          } else {
            buff[3] &= 0x7F;
            size = 4;
          }
        } else {
          buff[2] &= 0x7F;
          size = 3;
        }
      } else {
        buff[1] &= 0x7F;
        size = 2;
      }
    } else {
      buff[0] &= 0x7F;
      size = 1;
    }
 
    // now just append this data to the buffer
    aBuffer.append(&buff[0], size);
    return true;
  }

The beauty of this scheme is that it only does the work it needs to. Sure, it's a mess of nested if statements, but so what? It's not like you need this to be exceptionally readable, and it's not that bad. It's there when you need it, and it's going to run through the data like a hot knife through butter. Amazingly simple and clean.

And to deserialize it is just a fast:

  bool Message::unpackVarint( std::string & aBuffer,
                              uint16_t & aPos,
                              uint32_t & aValue ) const
  {
    /*
     * I'm going to go with Google's code here, for the most part, as they
     * have gone through the difficult part of optimization. I'm not a huge
     * fan of the 'goto's, but again, it's about speed here, and that's
     * more important than the use (or not) of a particular language struct.
     */
    const char  *ptr = & aBuffer.data()[aPos];
    uint32_t    b;
    uint32_t    result;
 
    b = *(ptr++); result  = (b & 0x7F)      ; if (!(b & 0x80)) { aPos += 1; goto done; }
    b = *(ptr++); result |= (b & 0x7F) <<  7; if (!(b & 0x80)) { aPos += 2; goto done; }
    b = *(ptr++); result |= (b & 0x7F) << 14; if (!(b & 0x80)) { aPos += 3; goto done; }
    b = *(ptr++); result |= (b & 0x7F) << 21; if (!(b & 0x80)) { aPos += 4; goto done; }
    b = *(ptr++); result |=  b         << 28; if (!(b & 0x80)) { aPos += 5; goto done; }
 
    /*
     * If we are here, then that means that the data is too big to fit into
     * a 32-bit int. Sadly, we need to keep reading until we find the end of
     * the varint, and update aPos based on the end of the actual data. The
     * value returned to the caller will simply be truncated.
     */
    for (int i = 0; i < 5; ++i) {
      b = *(ptr++); if (!(b & 0x80)) { aPos += (6 + i); goto done; }
    }
    // if we're here, then we overflowed the 64-bit varint, and it's bad.
    return false;
 
    // at this point, we're all done, save the value and leave
    done:
    aValue = result;
    return true;
  }

in the code, the main section of that method is really one line per byte, and it's pretty easy to read.

Again, the code only does what's needed, and it's pretty fast. This is some really good stuff from Google. I'm impressed. I'm also a little surprised that they made such heavy use of gotos in the code. Normally, that's a "coding no-no', but clearly, they have effectively used the old-style 'jump' quite effectively here.

They also created 64-bit versions of these methods, but here they didn't make the choice I would have made: they cleverly realized that on a 32-bit machine, it was faster to break up the 64-bit unsigned int into three 32-bit blocks (remember the single bit per byte for continuation), and then process those. However, for most modern linux systems, it's 64-bit, so they could have made a simple #ifdef on the word size and then use the "chunking" code when it's 32-bit, and expand the above methods to work on 64-bit numbers when not.

There's a significant advantage here because of this chunking - the code above does only what's necessary, and does it fast. Expanded to 64-bits it's no slower until you get to a number that can't be stored in 32 bits. At that point, the "extra" code is used, and it slows down a bit. But in Google's "chunking" they realize that to do that, they'd have to have additional compares to see how large a number is. Because of that, they did a simple, but effective, hardcoded binary tree to get the general size of the number. While this yields the best performance on a 32-bit machine, it's going to be far from the best on a 64-bit system.

Therefore, in my code, I put in the appropriate #ifdefs so that the right version is used for the wordsize of the generated code, and so we'll always be getting the best performance we can get. I'm very excited about this. And a little surprised that they didn't include this in their code, but no matter, I've got it and that's great.

When it comes to signed integers, they created this "ZigZag Scheme" where the signed integers are coded into unsigned integers by ranking them by their absolute value - negative values winning ties, and then using the same varint packing scheme on them as on the unsigned ones. It's pretty slick - and very fast. One line of code does it. Pretty amazing stuff.

With a few helper methods, I was able to change the serialization scheme pretty easily and then run my tests on the same data. What I saw was that a 29-byte packet of two strings went to 26 bytes because the Strings were encoded with these varints for the length, and that's a 20% savings for just two strings! Not bad. When I start to add in real data, it's going to get even more impressive.

All in all, I have to say that once again, today was a good day. Pretty neat stuff.

Big Day With Serialization – Boost and My Own

Tuesday, July 13th, 2010

Boost C++ Libraries

Today was a Big Day for my work with boost serialization, and I'm very glad to have stuck it out. Early this morning I was able to verify that if I serialized a std::list<msg::Message *> I was able to correctly get it across the wire and reconstitute it into the proper subclass on the other end. That was a huge relief. But I didn't stay happy for long. The size was pretty dismal.

I was sending 19 bytes of string data in two strings in the object, and the successfully serialized data was 109 bytes. That's a lot of overhead. I'm sure it's not really horrible as you scale up, and if I wasn't in an environment where size was everything, I'd have been happy with the results and gone on to the next problem. But size is critical here, and I had to do something about it.

So what to do?

I liked the way boost allowed for the versioning of the serialization. I liked the fact that we have a budding new serialization scheme from another group, and so I decided it was time to write my own, and this time, not include the need for a container class in order to properly serialize the messages.

So I gutted the boost serialization code and decided that I'd have each class implement a serialize() and deserialize() method and then also have a constructor that takes the raw data and "creates" an instance from it. If I then created enough helper functions for the different ivar types, I'd be able to make the code look pretty clean and we'd have a solid scheme in place.

The one additional component was that I'd have to create a MessageFactory that had a singleton that created the correct Message instance based on the type that was provided in a transmission header. OK, there's another thing I needed to add - a more complete message transmission header.

So here's what I ended up doing today:

First, I created a fixed-length message header to act as a preamble for all the message packets:

  namespace msg
  {
    struct Preamble
    {
      /*
       * This is meant to be a simple read/write struct - with methods on
       * those occasions that you'd like to have a little more "smarts"
       * than the simple struct.
       */
      uint16_t    messageSize;
      uint8_t     messageType;
      uint8_t     messageVersion;
 
      ...
    }
  }  // end of namespace msg

At this point, I created convenience constructors that took a Message and it's serialized data stream and filled in the values. Pretty simple, but then I could use that and replace my simple 'size' header in the boost asio send and receive methods with one of these guys, and the fixed size of one of these guys.

Now I had a header that could grow with me and have all the metadata I was going to need for my serialization scheme, but I still want to keep it small. The key here is that the 'type' of a message is a single 8-bit unsigned integer and that's more than enough for the work we're doing. I'll end up using this, along with the single-byte version number, and the binary data to reconstitute a new instance on the other side.

I next needed to create the serialize() method on the objects to put the data in the binary stream. Here's the base class' version of that method that starts the ball rolling:

  std::string Message::serialize() const
  {
    std::string	code;
    code.reserve(16384);
    return code;
  }

this makes sure that the empty std::string that's returned is at least big enough to hold almost every message I can generate. This is going to cut down on the allocations in the generation - which is really good for the processing speed. Still... if we have to go past the 16kB starting size, it's nice to know that the std::string will automatically grow as we append() data.

Now a derived class:

  std::string GoodBye::serialize() const
  {
    // get the super's serialization contents first
    std::string	code = Message::serialize();
    // ...and then add in mine
    pack(code, mReason);
    pack(code, mExplanation);
    return code;
  }

where the ivars are simple types (uint16_t, std::string, etc.) and I've created a series of pack() methods that append the binary format of the data to the std::string. These pack() methods aren't too hard - they are basically taking the data and casting it to a (const char *) and letting the std::string append() method do it's thing and add it to the end of it's data. So... things are looking pretty good so far. All I need now is a deserialize(), a constructor, and a factory.

  bool GoodBye::deserialize( const uint8_t aVersion,
                             const std::string & aCode,
                             uint16_t & aPos )
  {
    bool     error = false;
    /*
     * First, deserialize the super class, and then unpack all the
     * ivars I packed up in serialize() in the same order. If any
     * one fails, then the deserialization fails. All have to
     * succeed.
     */
    if (!Message::deserialize(aVersion, aCode, aPos) ||
        !unpack(aCode, aPos, mReason) ||
        !unpack(aCode, aPos, mExplanaztion)) {
      error = true;
    }
    return !error;
  }

what's happening here is that we're deserializing the super class and then getting to my ivars in the same order as the serialize() method. With the inclusion of the aVersion version variable, I'm able to include or exclude anything I need, and additions to the data buffer aren't going to mess me up either.

This gives me everything I wanted from boost serialization as far as versioning. I could then make a very simple constructor:

  GoodBye::GoodBye( const uint8_t aVersion,
                    const std::string & aCode ) :
    msg::Message(),
    mReason(),
    mExplanaztion()
  {
    uint16_t   pos = 0;
    if (!deserialize(aVersion, aCode, pos)) {
      cLog.warn("<constructor:deserialize> I was unable to deserialize "
                "the version %d code of %d bytes!", aVersion, aCode.size());
    }
  }

and with this, I can create them based on their type code - contained in the preamble. All that remained was to tackle the factory.

Thankfully, I'd expected that I'd need something like this, and all I needed to do was to create the methods that took a type code and returned a Message name, as well as a method that took a preamble and a data stream and created a message. The code for the latter was a lot like the former and looks like this:

  msg::Message *MessageFactory::deserialize( const msg::Preamble & aHeader,
                                             const std::string & aCode );
  {
    msg::Message     *msg = NULL;
    /*
     * This is the unfortunate part of the Factory - we need to just
     * run through all the message types, and use them to create the
     * appropriate Messages. It's possible to make a registration-based
     * system, but that can wait for a later version.
     */
    switch (aHeader.messageType) {
      case 0:  msg = new msg::Message(aHeader.messageVersion, aCode);  break;
      case 1:  msg = new Hello(aHeader.messageVersion, aCode);         break;
      case 2:  msg = new GoodBye(aHeader.messageVersion, aCode);       break;
      default:
        cLog.warn("[deserialize] the message type %d is unknown - trouble!",
                  aHeader.messageType);
        break;
    }
    return msg;
  }

Once I had all this together, I was able to build it and run it with my test tcpServer and tcpClient that were sending Hello messages to one another. I had one little problem in the decoding - a typo, and other than that, it all worked! Amazing.

The reduction in payload sizes was also dramatic - from 109 bytes to 29 bytes - again, on 19 bytes in two strings. The big problem to reducing this further was the serialization scheme that I was using:

Data Type Format
any integer 'L' + 8-byte long value
any float 'D' + 8-byte double value
String 'S' + 4-byte length code + n-byte char data

so it didn't matter that I was being efficient in my classes with the proper sized integers and std::string, I was getting a built-in expansion in the encoding.

Still, this was huge for me. I had finally mastered the boost serialization issues and then gone ahead and written my own that was just as effective, but faster and much smaller. What a great day!

Finally Making Headway with Boost Serialization and Asio

Monday, July 12th, 2010

Boost C++ Libraries

FOr the first time today, I'm starting to feel like I am getting a handle on the boost Asio and Serialization. I know I need to write all this up in a few posts, and I will, but I ran out of time today and had to run and catch the train. But progress was made.

I was able to properly get the async I/O working. You have to create a thread to call boost::io_service.run() - there's just no two ways about it. While I thought boost would be doing this for me, it isn't. OK, lesson learned. But I was able to take the TCPClient and TCPTransceiver classes and add in the boost threading without a lot of trouble. On the TCPTransceiver side, it's easy to start, and on the TCPClient side, you only need to start the thread if you want to do async I/O. Very easy to leave off, and it starts automatically if you register for async I/O messages.

Then I tackled the disconnect of either end of the socket. That wasn't too hard, and that made things a lot more professional. Looking better.

Finally, I tried to get the serialization working, but I'm convinced that the only way to do it is to put the single message into an stl::list<msg::Message *> because that's what's used over and over in the examples as a way to use the polymorphic serialization and de-serialization. So I've got to try that tomorrow morning.

But in the end, I'm looking at the boost serialization and thinking that I need to rip it out anyway as it's not cross-platform. If I do that, then I can use the MessageFactory concept and create a MessagePreamble like:

  struct MessagePreamble {
    uint16_t      messageBodySize;
    uint8_t       messageType;
    uint8_t       messageVersion;
  };

and that 32-byte preamble could be sent in front of the serialized message, and then the number of subsequent bytes read, and the messageType used to call the proper constructor with the signature:

  Message *newGuy = new Hello(preamble, data);

and each message handles the "take values from serialization stream" with a version and such just like the boost serialization. It's got all the features I need, and it's cross-platform. I think it's what I have to do once I get the boost serialization figured out.

Hope tomorrow is a good day!

Google Chrome dev 6.0.458.1 is Out

Monday, July 12th, 2010

This morning I noticed that Google Chrome dev was updated to 6.0.458.1 and the release notes are once again, a little sketchy. It's nice that it seems to be slowing down - maybe that means the 6.x release line is nearing completion. Then again, it could just mean work is slowing down.

In any case, it's good to keep up to date on something that is changing so quickly.

Struggling with Boost Serialization and Asio Libraries

Friday, July 9th, 2010

Boost C++ Libraries

I know I've been at this for days, and I'm making some progress, but it's tortuously slow, and several times today I wondered if this was even the right path. I could pull up my old CKit stuff and be productive in minutes. The problem is, I need async I/O. There's no two ways about it. If I cheat here and skip async I/O on these TCP sockets, I'll regret it forever.

So I have to slug through this no matter how bad it seems.

I just hope it gets better soon... I'm getting discouraged.

Pushing Ahead with Boost

Thursday, July 8th, 2010

Boost C++ Libraries

Today has been another very hard day with boost. It's not an easy set of libraries to get a handle on because it wasn't made for ease of use. It was made, it seems, for primarily STL expansion, so most things are just header files, but the stuff I need - Serialization and ASIO, are libraries. I can see that it appears capable, but getting something beyond the examples is non-trivial.

Certainly, a few examples where they don't put everything into one file would be nice. I've had problems understanding what should go in what classes and headers, simply because they made extremely compact examples. Certainly something to be proud of, but then there's the explanation that needs to go along with it.

Yup, someone could make a fortune with a really good book on boost.

Slugging Through Boost Serialization and Asio Libraries

Wednesday, July 7th, 2010

Today was spent trying to get a handle on the boost serialization and asio libraries to get some handle on the moving of messages over the tcp sockets that this project is going to use. If I had seen the "Serialization" example on the Asio Examples page, I'd have saved myself a ton of time, but I didn't until much later.

Ugh. What a horrible afternoon. I was trying to understand what they were doing with very limited examples and the docs on the classes aren't all that good. It's a mess.

Well... I think I have a lot better idea now.

Transmit 4.0.6 is Out

Tuesday, July 6th, 2010

Once again the happy boys at Panic have updated Transmit 4.0.6 and in this release there's several more significant issues fixed, but the overall count of changes isn't nearly as significant as the last release. Things seem to be slowing down a bit at Panic HQ. Still, it's great to see the improvements.

Lots of Heads-Down Coding, and Trying to Understand Boost

Tuesday, July 6th, 2010

Today has been a very hard day where I've been trying to get on top of boost and it's asio package as well as threads and serialization. It's a lot, I know, but the point of this project I'm on right now is to stop using all these home-grown classes and start to work in something like boost which will be a part of the standard moving forward. It'll make bringing up the next guy a lot easier if it's all using boost and then they only need to focus on the usage of boost, not the low-level threading, socket, and serialization libraries. It's an admirable goal, but it's a ton of work.

So today has been trying to figure out threading. I can appreciate the model boost has for threading, but I'd also like something that looks a little more like the Java Thread object - where you subclass it, have a method you implement and then can start this instance and away it goes. It's just targeted at different market - you could do the same thing, in theory, by having a method in a class the launch point for the thread, and then have everything in that one method.

Possible, but there will be times that a simple class like this will be handy - so I wrote it.

Next, I needed to deal with the atomic integers. Boost has the 16-bit versions, which is nice, but there were already applications of an atomic integer in the code (in the transfer statistics class) where I needed to have an unsigned 32-bit integer. So boost was not going to get me where I needed to be. Additionally, I really haven't liked the very C-like usage of a lot of these atomic operations on non-atomic data types. Why not just make a class that is an atomic 16-bit (and 32-bit) unsigned integer? So I did that as well.

What I ended up using were the GCC primitives which did everything I really needed - but it took some thinking to figure out. For example, with only:

  T __sync_fetch_and_add (T *ptr, T value, ...);
  T __sync_fetch_and_sub (T *ptr, T value, ...);
  T __sync_fetch_and_or (T *ptr, T value, ...);
  T __sync_fetch_and_and (T *ptr, T value, ...);
  T __sync_fetch_and_xor (T *ptr, T value, ...);
  T __sync_fetch_and_nand (T *ptr, T value, ...);

how do you do the "atomic set"? It struggled with this for a while until I also saw:

  bool __sync_bool_compare_and_swap (T *ptr, T oldValue, T newValue, ...);

and then it hit me: I just want to make sure that it always compares as "equal". Easy:

  auint32_t & auint32_t::operator=( const uint32_t aValue )
  {
    __sync_bool_compare_and_swap(&mValue, mValue, aValue);
  }

So I'm using the idea that the existing ivar is always going to equal itself, and because of that, we'll have a swap that puts the new value into place. Hey, it's not rocket science, but it wasn't clearly laid out, either.

In the end, I had two really nice atomic unsigned integer classes with all the nice operators overloaded so you can do the prefix "++" and the postfix as well, and the sum and difference. It's all there. They look and act like their non-atomic counterparts, but they are thread-safe without using any locks.

Google Chrome dev 6.0.453.1 is Out

Friday, July 2nd, 2010

GoogleChrome.jpg

I just noticed that this afternoon that Google Chrome dev 6.0.453.1 is out and so I updated right away. I noticed that the page refresh was a little snappier, and the release notes don't really give me a great lead:

  • Continued feature parity work

So we'll see... I'm all for snappier, and I'm all for more features, I just wish I knew what they really changed.