Interesting Bug in H2 Database 1.2.132

H2_logo.png

Today I've been fighting a bug in the H2 database that has been quite a puzzler. I'm not being overly critical of H2 as I think it's a great open source project, but when you see how non-obvious this bug is, I think you'll agree that there's a lot that still needs to be done with H2 in this regard.

To start, I am using H2 as an in-memory database for a web app that is collecting a lot of data - working on 9 million rows in memory. Not small. But the problem isn't about it's size, it's about a certain series of database inserts that cause, of all things, a Java StackOverflowException. Let's start with the table.

It's one of many tables in my in-memory schema, and it's pretty darn simple:

  CREATE TABLE volatility (
    acquired        datetime    NOT NULL,
    product         VARCHAR(64) NOT NULL,
    expiration      DATE        NOT NULL,
    strike          FLOAT       NOT NULL,
    volatility      DOUBLE,
    generationTime  datetime NOT NULL,
    CONSTRAINT volatility_PK PRIMARY KEY (acquired, product, expiration, strike)
  );

This guy is getting hit with all the volatility data we have for a given portfolio - inserting hundreds of rows every few seconds. The primary key of acquired, product (or underlying), expiration, and strike make it very easy to look up this volatility data for just about any where clause I might write.

When I started putting together the SQL string for inserting the data, I started running into problems. Specifically, I don't use PreparedStatements because I can't be sure how many inserts I'll be doing and there's also the problem of being able to optionally do the same updates to the back-end database if the configuration is set-up that way.

I have my UAT box writing it's data to the back-end database, and all the others just read that when restarting. This flexibility requires that I have something that travels as an object, and can be used to update one or two databases without having to re-create anything. This leaves out the PreparedStatements as they are built off the connection, and that's database-specific. So I had to stick with straight SQL.

In general, that's not too bad, and I haven't had any problems with this scheme until today, but today I think I met my match.

What I found was that when I started feeding the system the data for the volatilities, I could have it all run smoothly if I limited the number of strikes to five or less. But if I limited it to 50 or less, or no limit at all, I'd get a SQLException with the message being that there was a 'general' exception: StackOverflowException. I had no idea why I had created something with excessive recursion, so I went through the code several times and couldn't find anything.

I saw that H2 had been updated to 1.2.132 (from 1.2.127) and thought that maybe it might be a bug in H2 that had been fixed. Don't get me wrong, this is a bug in H2, but pin-pointing it to some specific code snippet is not going to be trivial. If I have the time in the future, I might do it, but it'd be a long-shot and take the better part of a day to get the test data set up into a simple test app to send to the H2 developers.

Anyway, I got H2 1.2.132 and tried it - no difference. Crud.

So then I tried setting the auto-commit on the JDBC Connection objects to "ON". Initially, my code had it set to "OFF" with an explicit commit() called on the connection once the SQL statement is executed. But even with the auto-commit "ON", we had the same problems.

Next I tried breaking up the SQL itself into smaller blocks based on the number of volatility points per expiration. If there are more than five volatility points in an expiration, then I break the SQL into separate statements in the execution part of the code. Since these are somewhat stand-alone, it shouldn't cause any problems. But I had my doubts.

I was really surprised to see that this break-up based on the number of strikes for an expiration solved the problem completely. What a break!

What I don't understand is why.

I can see that 4 of the 6 columns are in the primary key - but that in and of itself shouldn't be the issue. Not for a recursion problem. I can see that I'm inserting several hundred of these at a time, but the problem appears somewhere below 50, so that doesn't make a lot of sense.

Nope, this is a real puzzle to me. If I get the time, I'll try to write a simple test app and ship it to the H2 developers to see if they have any idea as to why it crashed. Very odd indeed.