Testing New Data Feeds for a Topology
Today has been a easy-going day of performance testing and topology tuning with the new data source I'm trying to integrate - email sends. This is coming in from a current batch process that's delivering batches of emails to different locations all around the globe. The trick with this is that testing it is kinda tough because if they aren't running a batch, you have no data to test with. Combine that with the fact that a lot of these tests with Storm need to run for a while to get to the steady-state condition, and it makes for a lot of staring at graphs.
I like to make a table of all experiments and their results, and for today's experiments this is what I ended up with:
RM Decoders | RM Mappers | Decorate | Topology Workers | rms-decode | rms-map | decorate | Time |
375 | 150 | 350 | 25 | 0.023 | 0.025 | 0.740 | 2:51:18 |
250 | 0.001 | 0.002 | 0.650 | 1:28:42 | |||
rewrite lookups | 250 | 0.033 | 0.004 | 0.527 | 31:41 | ||
100 | 50 | 250 | 0.000 | 0.000 | 0.594 | 20:26 | |
300 | 30 | 0.000 | 0.000 | 0.438 | 15:16 |
While it's still developing, it's clear to me that we started out with more resources on the email send bolts than we needed, and re-writing the lookups was an important step.
UPDATE: with the increase of the workers to 30, we finally have something that handles the load at least as well as production. That's good enough for today.