Nasty Java Feature on byte Primitives

java-logo-thumb.png

Today, while doing a little work on a web page, I ran into a bug in the Sun Java plug-in for Firefox on Linux that I've known about for quite a while. Basically, if there is a very long PARAM tag on the applet, the plug-in gets stuck in the read loop parsing the data. The Firefox developers have confirmed this - saying it's not in their hands. It doesn't appear on Windows - only Linux. Sad, but true.

Anyway, I was thinking that maybe it was just the length of the tag, and a way to make it shorter would be to compress it and then decompress it in the applet. Since I've got a BKCompressedString from previous XML work (large XML files compress very nicely) I decided to add this to the web system to see if it would buy me enough headroom to get the job done I needed done.

The tricky part was that I needed to have something that could pass through HTML, so after compressing, I had to Base64 encode it, and then decode it, decompress it in the applet. Not a problem. In keeping with the other serialization methods I've put in BKit and CKit, it was easy to put in the serialization scheme on the BKCompressedString and it should work.

Almost.

OK, not even close.

The compression and decompression was lossless. The Base64 encoding and decoding was lossy depending on the data you were giving it. I spent a lot of time working through the bits trying to figure out what the problem was only to be bitten by this lovely little Java byte primitive feature.

If I have the code:

    // mask these into the four 6-bit chunks
    dest1 = (byte) (src1 >>> 2);
    dest2 = (byte) (((src1 & 0x3) << 4) | (src2 >>> 4));
    dest3 = (byte) (((src2 & 0xf) << 2) | (src3 >>> 6));
    dest4 = (byte) (src3 & 0x3f);

and the variables dest1, src1, etc. are byte values, then one would think that the right shifting in the middle lines would obey the logic that an 8-bit value (a byte) would. What I found out was that based on the data I was converting, the byte values either weren't actually only 8 bits, or the shifting was adding in a little something special - because there were ones getting put in the shifted bits where, logically, nothing should be.

When I changed the code to look like:

    // mask these into the four 6-bit chunks
    dest1 = (byte) ((src1 & 0xfc) >>> 2);
    dest2 = (byte) (((src1 & 0x3) << 4) | ((src2 & 0xf0) >>> 4));
    dest3 = (byte) (((src2 & 0xf) << 2) | ((src3 & 0xc0) >>> 6));
    dest4 = (byte) (src3 & 0x3f);

then everything worked fine and the encoding was lossless as well and then the compression followed by the encoding and decoding and decompression was lossless.

It took me the better part of 3 hours to figure this out. I was stunned when it finally presented itself, as I initially thought that the bit-wise operations on byte quantities were flawless. I know better now.