Looking at Cassandra for Fast SQL Storage

Cassandra

I've got a lot of streaming data in Storm, and I'm doing quite a bit of analytical processing on that data, but I can't store much of it because the cost of storage is so high - latency. So I end up using a lot of redis boxes and having boxes read out of that into more conventional storage - like Postgres. But I'm starting to hear good things about Cassandra and Storm working well together.

My concern is real speed. I have an average message rate of 40k to 50k msgs/sec, with peaks as high as four times that. I need to be able to handle those peaks - which are by no means the once a month peak levels of several times that, but it's something I see on a regular basis, and we need to be able to take all this data from the topology and not slow it down.

If I can really do this on, say, eight machines, then I'll be able to have the kind of deep dive we've needed for looking into the analytics we're calculating. This would be a very big win.

I've reached out to a few groups that are doing something very similar to this, and their preliminary results say we can do it. That's really good news. We'll have to see what happens with the real hardware and software.