Calculating Percentages with SQL
I have been running some tests all morning, and they'll continue throughout the day, but as I'm watching these tests and gathering data, it occurred to me that I wanted to know which of my data injectors was giving me the most data, and by how much. It's not hard to get the counts with SQL:
SELECT portfolio, COUNT(*) AS hits FROM PortfolioData WHERE acquired > '2009-12-15' GROUP BY portfolio ORDER BY hits DESC
and I get a table that looks a lot like this:
portfolio | hits |
Gas_NGUNG | 114150 |
NG MM | 111106 |
OIL_CLUSO | 95886 |
Oil MM | 91320 |
S&P | 28696 |
L_NRML | 22820 |
ED MM | 21957 |
Oil Indexes | 20268 |
Rho Hedge | 18579 |
ER_NRML | 17974 |
...and I've truncated the table because the effect I was looking for is clear - there are a few of the portfolios that are contributing the vast majority of the rows to the table. The problem is, I can't see the real percentage each contributes to the total. And while I've shown the top 10, there are really more than 30, so it's not easy to do the percentage calculation in my head.
I googled a bit and came up with a really simple solution: include the subquery as the divisor in the select statement:
SELECT portfolio, COUNT(*) AS hits, COUNT(*)*100.0/(SELECT COUNT(*) FROM PortfolioData WHERE acquired > '2009-12-15') AS percentage FROM PortfolioData WHERE acquired > '2009-12-15' GROUP BY portfolio ORDER BY hits DESC
and to this I get what I was looking for:
portfolio | hits | percentage |
Gas_NGUNG | 114150 | 15.07 |
NG MM | 111106 | 14.67 |
OIL_CLUSO | 95886 | 12.66 |
Oil MM | 91320 | 12.06 |
S&P | 28696 | 3.79 |
L_NRML | 22820 | 3.01 |
ED MM | 21957 | 2.89 |
Oil Indexes | 20268 | 2.67 |
Rho Hedge | 18579 | 2.45 |
ER_NRML | 17974 | 2.37 |
With this, I can now see that my top 4 (out of more than 30) contribute more than 54% of the rows in the table. That's significant, and it's nice to know. If I throttle back these four, I have a great deal of control over the total number of rows inserted in a day.
Cool.