What's it all about, Alfie?

Getting Bolt Metrics from Nimbus

Storm Logo

This morning I wanted to be able to add the Storm topology bolt capacity values to the in-house monitoring and graphing tools that The Shop uses. The reason for this is that I'm constantly checking the Storm UI to see what the capacity values are for the bolts on my critical topology, and it'd be so much nicer to be able to see them in a simple graph on the display that I'm already looking at for disk space, CPU usage, and also the higher-level metrics like the messages per second emitted from each bolt.

The latter is something I figured out by digging into the Nimbus JavaDocs, and it was still useful in this bit of detective work. But the biggie was the code that the Storm UI uses to generate it's response. That was a little harder to find, but when I found it, I knew I had what I needed to get the job done.

The resulting code wasn't too bad:

(ns gym.storm.nimbus "Namespace for exercising Nimbus to find out facts about the specific storm cluster. This data can be used to monitor the cluster and look at stats in a way that's got very low load on the overall system." (:require [backtype.storm.clojure :refer :all] [backtype.storm.config :refer :all] [backtype.storm.ui.core :refer :all] [clj-endpoints :as ep] [clj-endpoints.persistence.redis :refer [wcar]] [clj-endpoints.util :as util] [clojure.string :as cs] [clojure.tools.logging :refer [error infof warnf]] [taoensso.carmine :as car]) (:import [org.apache.thrift7.transport TSocket TFramedTransport] [org.apache.thrift7.protocol TBinaryProtocol] [backtype.storm.generated Nimbus$Client SupervisorSummary])) (defn get-emitted-totals "Function to take a config map and a topology name, and query Nimbus to see what the `emitted` totals are for each of the bolts in the topology. This is a very lightweight way to keep track of what's going on in the topology without monitoring all the messages coming out of the kafka cluster." [cluster topo] (let [cfg (ep/config cluster) tft (TFramedTransport. (TSocket. (:nimbus cfg) 6627)) nc (Nimbus$Client. (TBinaryProtocol. tft))] (try (.open tft) (let [ci (.getClusterInfo nc) ts (first (filter #(= topo (.get_name %)) (.get_topologies ci))) ti (.getTopologyInfo nc (.get_id ts)) exes (.get_executors ti) cnts (into {} (for [[k v] (group-by #(.get_component_id %) exes)] [(keyword k) (util/safe-sum (map get-emitted v))]))] (.close tft) { :nimbus-host (:nimbus cfg) :topology topo :executors (count exes) :counts cnts }) (catch Exception e (warnf "Exception thrown: %s" (.getMessage e)))))) (defn get-capacity "Function to take a config map and a topology name, and query Nimbus to see what the capacity is for each of the bolts in the topology. This is a very lightweight way to keep track of what's going on in the topology without monitoring all the messages coming out of the kafka cluster." [cluster topo] (let [cfg (ep/config cluster) tft (TFramedTransport. (TSocket. (:nimbus cfg) 6627)) nc (Nimbus$Client. (TBinaryProtocol. tft))] (try (.open tft) (let [ci (.getClusterInfo nc) ts (first (filter #(= topo (.get_name %)) (.get_topologies ci))) tid (.get_id ts) ti (.getTopologyInfo nc tid) st (.getTopology nc tid) exes (.get_executors ti) bolts (group-by-comp (filter (partial bolt-summary? st) exes)) caps (into {} (for [[id bc] bolts] [(keyword id) (util/to-4dp (compute-bolt-capacity bc))]))] (.close tft) { :nimbus-host (:nimbus cfg) :topology topo :executors (count exes) :capacity caps }) (catch Exception e (warnf "Exception thrown: %s" (.getMessage e))))))

What really surprised me was that even the Storm UI was written exactly as I would have done it - all in clojure with compojure for the RESTful API. It was pretty sparsely documented, but in the end, I can understand an Apache project with sparse documentation - it's kinda to be expected.

I fiddled around with the return values and the difference between a StormTopology and a TopologyInfo - and how it's used in both the emitted counts and the calculation of the capacity for the bolts. But in the end, by looking carefully at the code as an example, I was able to get what I needed out of the library. Very nice.

Topology Capacity

This entry was posted on Monday, September 8th, 2014 at 9:08 am and is filed under Clojure Coding, Cube Life, Open Source Software. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Getting Bolt Metrics from Nimbus

Pages

Archives

Categories