Big Day With Serialization – Boost and My Own
Today was a Big Day for my work with boost serialization, and I'm very glad to have stuck it out. Early this morning I was able to verify that if I serialized a std::list<msg::Message *> I was able to correctly get it across the wire and reconstitute it into the proper subclass on the other end. That was a huge relief. But I didn't stay happy for long. The size was pretty dismal.
I was sending 19 bytes of string data in two strings in the object, and the successfully serialized data was 109 bytes. That's a lot of overhead. I'm sure it's not really horrible as you scale up, and if I wasn't in an environment where size was everything, I'd have been happy with the results and gone on to the next problem. But size is critical here, and I had to do something about it.
So what to do?
I liked the way boost allowed for the versioning of the serialization. I liked the fact that we have a budding new serialization scheme from another group, and so I decided it was time to write my own, and this time, not include the need for a container class in order to properly serialize the messages.
So I gutted the boost serialization code and decided that I'd have each class implement a serialize() and deserialize() method and then also have a constructor that takes the raw data and "creates" an instance from it. If I then created enough helper functions for the different ivar types, I'd be able to make the code look pretty clean and we'd have a solid scheme in place.
The one additional component was that I'd have to create a MessageFactory that had a singleton that created the correct Message instance based on the type that was provided in a transmission header. OK, there's another thing I needed to add - a more complete message transmission header.
So here's what I ended up doing today:
First, I created a fixed-length message header to act as a preamble for all the message packets:
namespace msg { struct Preamble { /* * This is meant to be a simple read/write struct - with methods on * those occasions that you'd like to have a little more "smarts" * than the simple struct. */ uint16_t messageSize; uint8_t messageType; uint8_t messageVersion; ... } } // end of namespace msg
At this point, I created convenience constructors that took a Message and it's serialized data stream and filled in the values. Pretty simple, but then I could use that and replace my simple 'size' header in the boost asio send and receive methods with one of these guys, and the fixed size of one of these guys.
Now I had a header that could grow with me and have all the metadata I was going to need for my serialization scheme, but I still want to keep it small. The key here is that the 'type' of a message is a single 8-bit unsigned integer and that's more than enough for the work we're doing. I'll end up using this, along with the single-byte version number, and the binary data to reconstitute a new instance on the other side.
I next needed to create the serialize() method on the objects to put the data in the binary stream. Here's the base class' version of that method that starts the ball rolling:
std::string Message::serialize() const { std::string code; code.reserve(16384); return code; }
this makes sure that the empty std::string that's returned is at least big enough to hold almost every message I can generate. This is going to cut down on the allocations in the generation - which is really good for the processing speed. Still... if we have to go past the 16kB starting size, it's nice to know that the std::string will automatically grow as we append() data.
Now a derived class:
std::string GoodBye::serialize() const { // get the super's serialization contents first std::string code = Message::serialize(); // ...and then add in mine pack(code, mReason); pack(code, mExplanation); return code; }
where the ivars are simple types (uint16_t, std::string, etc.) and I've created a series of pack() methods that append the binary format of the data to the std::string. These pack() methods aren't too hard - they are basically taking the data and casting it to a (const char *) and letting the std::string append() method do it's thing and add it to the end of it's data. So... things are looking pretty good so far. All I need now is a deserialize(), a constructor, and a factory.
bool GoodBye::deserialize( const uint8_t aVersion, const std::string & aCode, uint16_t & aPos ) { bool error = false; /* * First, deserialize the super class, and then unpack all the * ivars I packed up in serialize() in the same order. If any * one fails, then the deserialization fails. All have to * succeed. */ if (!Message::deserialize(aVersion, aCode, aPos) || !unpack(aCode, aPos, mReason) || !unpack(aCode, aPos, mExplanaztion)) { error = true; } return !error; }
what's happening here is that we're deserializing the super class and then getting to my ivars in the same order as the serialize() method. With the inclusion of the aVersion version variable, I'm able to include or exclude anything I need, and additions to the data buffer aren't going to mess me up either.
This gives me everything I wanted from boost serialization as far as versioning. I could then make a very simple constructor:
GoodBye::GoodBye( const uint8_t aVersion, const std::string & aCode ) : msg::Message(), mReason(), mExplanaztion() { uint16_t pos = 0; if (!deserialize(aVersion, aCode, pos)) { cLog.warn("<constructor:deserialize> I was unable to deserialize " "the version %d code of %d bytes!", aVersion, aCode.size()); } }
and with this, I can create them based on their type code - contained in the preamble. All that remained was to tackle the factory.
Thankfully, I'd expected that I'd need something like this, and all I needed to do was to create the methods that took a type code and returned a Message name, as well as a method that took a preamble and a data stream and created a message. The code for the latter was a lot like the former and looks like this:
msg::Message *MessageFactory::deserialize( const msg::Preamble & aHeader, const std::string & aCode ); { msg::Message *msg = NULL; /* * This is the unfortunate part of the Factory - we need to just * run through all the message types, and use them to create the * appropriate Messages. It's possible to make a registration-based * system, but that can wait for a later version. */ switch (aHeader.messageType) { case 0: msg = new msg::Message(aHeader.messageVersion, aCode); break; case 1: msg = new Hello(aHeader.messageVersion, aCode); break; case 2: msg = new GoodBye(aHeader.messageVersion, aCode); break; default: cLog.warn("[deserialize] the message type %d is unknown - trouble!", aHeader.messageType); break; } return msg; }
Once I had all this together, I was able to build it and run it with my test tcpServer and tcpClient that were sending Hello messages to one another. I had one little problem in the decoding - a typo, and other than that, it all worked! Amazing.
The reduction in payload sizes was also dramatic - from 109 bytes to 29 bytes - again, on 19 bytes in two strings. The big problem to reducing this further was the serialization scheme that I was using:
Data Type | Format |
any integer | 'L' + 8-byte long value |
any float | 'D' + 8-byte double value |
String | 'S' + 4-byte length code + n-byte char data |
so it didn't matter that I was being efficient in my classes with the proper sized integers and std::string, I was getting a built-in expansion in the encoding.
Still, this was huge for me. I had finally mastered the boost serialization issues and then gone ahead and written my own that was just as effective, but faster and much smaller. What a great day!