Today has been one of those days that I knew was coming, didn't know when it'd arrive, but knew without a shadow of a doubt that it was going to be a defining moment for me at The Shop. Today, I kinda butted heads with the manager of the centralized Kafka cluster about how to publish to his cluster - in fact how to properly publish to any kafka cluster, and what the issues are with Kafka 0.7 and adding new boxes.
First, the set-up. I've been asked to publish to the shared kafka cluster by my new manager because he sees nothing up upside to using shared services. He believes that if the shared services are not up to the task, that he can then apply pressure to get them up to the task. I am not so optimistic. I will gladly ask for help from anyone - as long as the product will benefit from it. But if the product suffers, I don't care who it is - they have to shape up or ship out.
So - against my wishes - I started publishing to the shared kafka cluster. We started having a lot of problems, but everyone was happy - save me. They added machines to the cluster, and because the topics I publish to already existed in the cluster, the known bug in Kafka 0.7 didn't allow the automatic rebalancing of the topic to the new boxes. You have to publish a message - no matter how small - to the specific boxes under the same topic name, and then they will start picking up traffic - automatically.
I know this because I ran into this problem, had to figure it out, and finally did after creating a little function to send an empty JSON message to a specific partition on a specific box in the cluster. But it worked like a champ, so I knew how this worked for Kafka 0.7.
Today, I had a disagreement with the manager of the shared cluster because he wanted people to write to specific machines, and then use the load balancer to assign different machines to different publishing clients. Sadly, this is not how kafka is meant to be used. It's meant to be used with a single automatic configuration based on the cluster configuration in zookeeper, and in this way, distributing the load to all the boxes in the cluster in equal share.
The manager's idea allows the load balancer to direct the traffic - but allows things to be very unbalanced, and therefore complicating all the topologies based on these topics. It's just bad design to use Kafka in this way. But it does get around the problem of adding boxes to the cluster and activating the topics on the new boxes.
But that's trivial with the 4 line clojure function I wrote:
(defn direct-injection
"Function to send a single message to a specified topic on the specified
kafka broker - bypassing all the zookeeper stuff to make sure that this
one message gets to this one broker. This is essential for bootstrapping
a new broker box to an existing topic."
[broker topic msg]
(let [p (kp/producer {"broker.list" "2:kafka-broker3:9092"})]
(kp/send-messages p topic [(kp/message (.getBytes msg))])))
and it only needs to be run once for each existing topic on each new box. It's trivial.
Finally, I got them to see this, and do this, and my publishing automatically picked this up and started pushing messages to all boxes evenly. As you should with Kafka.
The moral of the story of today is that you can use shared tools - and it can appear to save you time and money - but appearances are deceptive, and you can shoot yourself int he foot so fast that you will find that careful consideration on all deployment issues is really the time - and money - saver.