Apache Pulsar is an Interesting Project

Pulsar

I read a tweet this morning about Apache Pulsar achieving top-level project status in the Apache organization. I hadn't heard of it, so I spent a little time reading about it and getting familiar with the differences between it and Storm, Flink, etc. What I found was something that I might actually use in the future.

The important parts are that Pulsar is a message broker with reliable messaging, with storage and fail-over. This makes it way better than many other systems, but it goes a lot further than that. Because it's Apache, it's using ZooKeeper, but it's also using BookKeeper for the message storage - which is an interesting twist.

It's scalable up to a million topics... is meant to sync servers by geo-location, and all the goodies you'd expect from something like Cassandra or the like. Lost of resilience. Turns out, it was the message broker for Yahoo, and has been open sourced. It's low-latency, and meant to be server-to-server in the datacenter, even if they are separated by continents. So it's battle-tested. And all this makes it a useful project because there are lots of times I can imagine needing a better broker than SQS.

But what makes it even nicer is that it has all the Lambda capabilities that AWS has as well. So now you can add in the streaming capabilities of Storm in the messaging system - as opposed to having a streaming system talk to Kafka as a source and sink. This is a different take on the problem, but one of the problems I had with Storm was the scaling of the nodes - it was hard to get them to not back-up. If this is a message broker, then we have a lot more flexibility on the processing time, and every hand-off is to a queue/topic. That's good news.

I'm not sure if/when I'll need to have something like this, but I'm very happy to see that they have it, and it's gotten to a point that it looks pretty stable. Good news!