Testing New Data Feeds for a Topology

Storm Logo

Today has been a easy-going day of performance testing and topology tuning with the new data source I'm trying to integrate - email sends. This is coming in from a current batch process that's delivering batches of emails to different locations all around the globe. The trick with this is that testing it is kinda tough because if they aren't running a batch, you have no data to test with. Combine that with the fact that a lot of these tests with Storm need to run for a while to get to the steady-state condition, and it makes for a lot of staring at graphs.

I like to make a table of all experiments and their results, and for today's experiments this is what I ended up with:

RM Decoders RM Mappers Decorate Topology Workers rms-decode rms-map decorate Time
375 150 350 25 0.023 0.025 0.740 2:51:18
250 0.001 0.002 0.650 1:28:42
rewrite lookups 250 0.033 0.004 0.527 31:41
100 50 250 0.000 0.000 0.594 20:26
300 30 0.000 0.000 0.438 15:16

While it's still developing, it's clear to me that we started out with more resources on the email send bolts than we needed, and re-writing the lookups was an important step.

UPDATE: with the increase of the workers to 30, we finally have something that handles the load at least as well as production. That's good enough for today.