Datadog Gauges in Clojure
Tuesday, August 18th, 2015The Shop is big into Datadog, and it's not a bad metrics collection tool, which I've used very successfully from clojure. Based on the use of the local Datadog Agent (freely available from Datadog) and how easily it's placed on linux hosts - AWS or otherwise, it's a clear win for collecting metrics from your code and shipping them to a nice graphing/alerting platform like Datadog.
The code I've set up for this is pretty simple, and based on the com.codahale.metrics java libraries. With a simple inclusion into your project.clj file:
[io.dropwizard.metrics/metrics-core "3.1.0"] [org.coursera/dropwizard-metrics-datadog "1.0.2"]
you can then write a very nice metrics namespace:
(ns ns-toolkit.metrics "This is the code that handles the metrics and events through the Dropwizard Metrics core library, which, in turn, will ship it over UDP to the DataDog Agent running on localhost." (:require [clojure.tools.logging :refer [infof debugf warnf errorf]]) (:import [com.codahale.metrics MetricRegistry] [org.coursera.metrics.datadog DatadogReporter] [org.coursera.metrics.datadog.transport UdpTransportFactory UdpTransport] [java.util.concurrent TimeUnit])) ;; Create a simple MetricRegistry - but make it only when it's needed (defonce def-registry (delay (let [reg (MetricRegistry.) udp (.build (UdpTransportFactory.)) rpt (-> (DatadogReporter/forRegistry reg) (.withTransport udp) (.withHost "localhost") (.convertDurationsTo TimeUnit/MILLISECONDS) (.convertRatesTo TimeUnit/SECONDS) (.build))] (.start rpt 5 TimeUnit/SECONDS) reg))) ;; Somewhat faking java.jdbc's original *connection* behavior so that ;; we don't have to pass one around. (def ^:dynamic *registry* nil) (defn registry "Function to return either the externally provided MetricRegistry, or the default one that's constructed when it's needed, above. This allows the user the flexibility to live with the default - or make one just for their needs." [] (or *registry* @def-registry))
And then we can define the simple instrumentation types from this:
;; ;; Functions to create/locate the different Metrics instruments available ;; (defn meter "Function to return a Meter for the registry with the provided tag (a String)." [tag] (if (string? tag) (.meter (registry) tag))) (defn counter "Function to return a Counter for the registry with the provided tag (a String)." [tag] (if (string? tag) (.counter (registry) tag))) (defn histogram "Function to return a Histogram for the registry with the provided tag (a String)." [tag] (if (string? tag) (.histogram (registry) tag))) (defn timer "Function to return a Timer for the registry with the provided tag (a String)." [tag] (if (string? tag) (.timer (registry) tag)))
These can then be held in maps or used for any reason at all. They automatically send their data to the local Datadog Agent over UDP so there's no delay to the logger, and since it's on the same box, the likelihood that something will be dropped is very small. It's a wonderful scheme.
But one of the things that's not covered in these metrics is the Gauge. And there's a really good reason for that - the Gauge for Datadog is something that is read from the Datadog Agent, and so has to be held onto by the code so that subsequent calls can be made against it for it's value.
In it's simplest form, the Gauge is just a value that's read by the agent on some interval and sent to the Datadog service. This callback functionality is done with a simple anonymous inner class in Java, but that's hard to do in clojure - or is it?
With Clojure 1.6, we have something that makes this quite easy - reify. If we simply add an import:
(:import [com.codahale.metrics Gauge])
and then we can write the code to create an instance of Gauge with a custom getValue() method where we can put any clojure code in there we want. Like:
;; ;; Java functions for the Metrics library (DataDog) so that we can ;; constantly monitor the breakdown of the active docs in the system ;; by these functions. ;; (defn cnt-status "Function that takes a status value and finds the count of loans in the `laggy-counts` response that has that status. This is used in all the metrics findings - as it's the exact same code - just different status values." [s] (reify Gauge (getValue [this] (let [sm (first (filter #(= s (:status %)) (laggy-counts)))] (parse-int (:count sm)))))) (defn register-breakdown "Function to register all the breakdowns of the loan status counts with the local Datadog agent to be sent to Datadog for plotting. This is a little interesting because Datadog will call *these* functions as needed to get the data to send, and we will control the load by using memoized functions." [] (.register (met/registry) "trident.loan_breakdown.unset" (cnt-status nil)) (.register (met/registry) "trident.loan_breakdown.submit_to_agent" (cnt-status "Submit to Agent")) (.register (met/registry) "trident.loan_breakdown.submit_to_lender" (cnt-status "Submit to Lender")) (.register (met/registry) "trident.loan_breakdown.submit_to_lender_approved" (cnt-status "Submit to Lender - Agent Approved")) (.register (met/registry) "trident.loan_breakdown.lender_approved" (cnt-status "Lender Approved")))
What I like about this is that I can allow the Datadog Agent to hit this code as often as it wants, and don't have to worry about the freshness of the data - or an excessive loan on the server resources for being hit too much. I can simply memoize the functions I'm using and then control the load on my end. It's very clean, and very nice.