While doing some network testing/optimization recently, one of the network guys suggested I look at the jTDS parameter packetSize. He thought it might be something to look at if all else failed.
Since I had pretty much gotten to that point, I decided this morning to do those tests, and at the same time take a look at what H2 might say about performance tuning - as that was the destination of the data, after all.
The first step was to change the datatype in the database. According to the H2 docs:
Each data type has different storage and performance characteristics:
- The DECIMAL/NUMERIC type is slower and requires more storage than the REAL and DOUBLE types.
- Text types are slower to read, write, and compare than numeric types and generally require more storage.
- See Large Objects for information on BINARY vs. BLOB and VARCHAR vs. CLOB performance.
- Parsing and formatting takes longer for the TIME, DATE, and TIMESTAMP types than the numeric types.
- SMALLINT/TINYINT/BOOLEAN are not significantly smaller or faster to work with than INTEGER in most modes.
The DBA I'd worked with to set up the back-end database that I read from didn't like using the double datatype primarily due to rounding. I said it was OK, but relented when he pressed. I then used the same DECIMAL(19,6) in the H2 in-memory database as existed in the MS SQL Server database. Seems reasonable, but it flies in the face of the suggestion from the H2 docs.
Since it's all Java, and a Java Double is OK with me, I decided to change all the DECIMAL(19,6) columns in the in-memory database to double. The results were amazing. I was able to achieve more than a 50% increase in the rows/sec processed by this simple change. Additionally, I was able to see a significant reduction in the memory used for the web app after making this change.
All told, a wonderful suggestion.
Then I took to running tests with different values of packetSize. I got:
512 |
11,496 |
11,904 |
1024 |
11,636 |
13,650 |
2048 |
11,902 |
13,941 |
4096 |
11,571 |
12,703 |
8192 |
12,560 |
14,774 |
16384 |
12,447 |
14,744 |
32768 |
12,753 |
14,017 |
65536 |
12,680 |
15,038 |
where the data is (rows/sec) processed from the back-end database into the in-memory database. Faster is clearly better.
What I found was that a size of 8192 was the smallest value that got good performance. So that's what I went with. With these changes, my 7 minute restart is down to about 2:20 - an impressive improvement.