Reading a GZip File in C++ – Boost Wins

Boost C++ Libraries

Today I needed to be able to read compressed files for a service I was writing. Sure, I could have shelled out and run gunzip on the file and then gzip-ed it up after reading it, but I wanted something that would allow me to read the gzipped file in-place and uncompress it into a simple std::string for processing.

Enter boost to the rescue.

This is one of the more difficult things to get right in boost… OK, I take that back, it's pretty easy by comparison to the serialization and ASIO, but it's not something that is simple to see from their docs. Also, some of the more obvious attempts to use the boost filtering iostreams yielded some pretty bad results. Still… as I kept with it, success emerged.

Here's what finally worked for me:

  #include <zlib.h>
  #include <boost/iostreams/filtering_stream.hpp>
  #include <boost/iostreams/filter/gzip.hpp>
  #include <boost/iostreams/copy.hpp>
 
 
  std::string     contents;
  std::ifstream   file(aFilename.c_str(),
                       std::ios_base::in | std::ios_base::binary);
  if (!file.is_open()) {
    error = true;
    cLog.error("can't open the file %s", aFilename.c_str());
  } else {
    using namespace boost::iostreams;
    // make the filter for the gzip with the right args…
    filtering_streambuf<input> gin;
    zlib_params   p;
    p.window_bits = 16 + MAX_WBITS;
    gin.push(zlib_decompressor?);
    gin.push(file);
    // now let's get a string stream for a destination
    std::stringstream  ss;
    // copy the source to the dest and trap errors
    try {
      copy(gin, ss);
      contents = ss.str();
      cLog.info("read in %u bytes from %s", contents.size(), aFilename.c_str());
    } catch (zlib_error & err) {
      error = true;
      cLog.error("decompression error on %s: %s (%d)",
                 aFilename.c_str(), err.what(), err.error());
    }
  }

What's the point of all this? Well, it turns out that boost isn't about the general decompression file streams, it's about pipelined filters and one of the filters is a gzip compressor and decompressor. It's more flexible, yes, but it's a little harder to do, and it ends up with an intermediate std::stringstream that we don't need. But in the end, this is only about 100msec slower than reading the file uncompressed. That's a reasonable performance hit for the fact that I didn't have to do the messing with the zlib libraries.

Yeah boost!