Revisiting Capacity for Topology
Now that I found that I had a bug in the caching, I wanted to go back and look at the topology capacity numbers, because they looked a lot different now with the caching fixed, than it did with the useless caching. So I started looking at the numbers, and it was clear that we were backing up at the Kafka transmitter bolt, and that was applying back-pressure to the JSON encoding bolt, and that was putting back-pressure on the decorate bolt where we had been putting all the work into to fix up the caching for speed.
Given this, and that we're not going to expand the kafka cluster at this time, it made sense to back off on the size of the workers, and bolt parallelization hints, and see if it made any difference. After all, we can't push out messages faster than kafka is capable of taking them.
I went from 50 to 40 workers, and saw similar numbers:
and the capacity:
Then when I went to 30, I lost the email sends for the day and I'll have to let it sit for a day to try again tomorrow. Still... it's very promising, and I'm very glad that the caching is fixed.