More Tuning of Storm Topology

Unified Click

I spent a good chunk trying to get my data stream topology to work smoothly, but it's proving to be very elusive. I've tried all manner of things - including expanding to take nearly all of 20 machines up for the one topology. Crazy.

When I got the first version of this up, I wrote an email to my VP saying this is possible, but it's going to be expensive. Very. The hardware isn't cheap, and we're going to need so much of it - due to the inefficiencies in the JVM, in the messaging, in clojure... it's just going to take a ton of metal. I'm not sure they understood the magnitude of this cost.

We're bearing down on the Holiday Season, and we have no excess capacity. None. Twenty worker nodes all jammed up. Now if we need even 100% excess (a 2x spike) - which is nothing in the Cyber Monday sense, then we need an additional 20 machines. Amazing. For a capacity of about 100k msgs/sec.

At my last Finance job we did 3 mil msgs/sec on three boxes. Add in greeks and what-if analysis and it's 5 boxes. The idea that we need 40 boxes for 100k msgs/sec is just crazy. We are building on a tech stack that is super inefficient.

But we are... and it's not my decision - as hard as I've lobbied for a more efficient solution.