The Black Magic of Optimizing SQL Queries
OK, so maybe it's not really Black Magic but putting yourself in the place of the query planner/optimizer is a really big help. For instance, today I was working on trying to get a little more speed out of the following H2 (JDBC) SQL query:
SELECT acquired, portfolio, leadFuturePrice, pnlSumNet FROM RAT_FirePortfolio WHERE acquired >= '2009-10-20 00:00:00' AND acquired <= '2009-10-20 23:59:59' AND portfolio NOT IN ('ED_NRML')
and the requests were returning in excess of 120,000 rows of data in about 600 - 775 msec. Not bad for an in-memory database, but I was thinking that the use of the "not in ()" was going to slow down the execution of the query because it would have to check each of the row's portfolio fields against each of the elements of the list. Not ideal.
So I got the idea to change it to a more direct version:
SELECT acquired, portfolio, leadFuturePrice, pnlSumNet FROM RAT_FirePortfolio WHERE acquired >= '2009-10-20 00:00:00' AND acquired <= '2009-10-20 23:59:59' AND portfolio <> 'ED_NRML'
So I coded that up. At the same time, I realized that the exclusion of 'ED_NRML' was really a transitional artifact and was no longer needed. There are times when we drop data sources, and this keeps the new version from falsely showing the existing, but old, data. Now that we have completely transitioned off that, we didn't need it and the query simplified to:
SELECT acquired, portfolio, leadFuturePrice, pnlSumNet FROM RAT_FirePortfolio WHERE acquired >= '2009-10-20 00:00:00' AND acquired <= '2009-10-20 23:59:59'
The times for this guy are in the 50 - 57 msec. Amazing!
We're looking at a factor of ten by removing the test. Now I had suspected that the "not in ()" clause was a little bit of an issue, but I had never guessed it was 90% of the time! That's amazing.
I'll never doubt again. I need to really hit these queries and make sure they are as clean and simple as possible. Simply amazing.