Archive for August, 2015

Datadog Gauges in Clojure

Tuesday, August 18th, 2015

Datadog

The Shop is big into Datadog, and it's not a bad metrics collection tool, which I've used very successfully from clojure. Based on the use of the local Datadog Agent (freely available from Datadog) and how easily it's placed on linux hosts - AWS or otherwise, it's a clear win for collecting metrics from your code and shipping them to a nice graphing/alerting platform like Datadog.

The code I've set up for this is pretty simple, and based on the com.codahale.metrics java libraries. With a simple inclusion into your project.clj file:

  [io.dropwizard.metrics/metrics-core "3.1.0"]
  [org.coursera/dropwizard-metrics-datadog "1.0.2"]

you can then write a very nice metrics namespace:

  (ns ns-toolkit.metrics
    "This is the code that handles the metrics and events through the Dropwizard
    Metrics core library, which, in turn, will ship it over UDP to the DataDog
    Agent running on localhost."
    (:require [clojure.tools.logging :refer [infof debugf warnf errorf]])
    (:import [com.codahale.metrics MetricRegistry]
             [org.coursera.metrics.datadog DatadogReporter]
             [org.coursera.metrics.datadog.transport UdpTransportFactory
                                                     UdpTransport]
             [java.util.concurrent TimeUnit]))
 
  ;; Create a simple MetricRegistry - but make it only when it's needed
  (defonce def-registry
    (delay
      (let [reg (MetricRegistry.)
            udp (.build (UdpTransportFactory.))
            rpt (-> (DatadogReporter/forRegistry reg)
                  (.withTransport udp)
                  (.withHost "localhost")
                  (.convertDurationsTo TimeUnit/MILLISECONDS)
                  (.convertRatesTo TimeUnit/SECONDS)
                  (.build))]
        (.start rpt 5 TimeUnit/SECONDS)
        reg)))
 
  ;; Somewhat faking java.jdbc's original *connection* behavior so that
  ;; we don't have to pass one around.
  (def ^:dynamic *registry* nil)
 
  (defn registry
    "Function to return either the externally provided MetricRegistry, or the
    default one that's constructed when it's needed, above. This allows the user
    the flexibility to live with the default - or make one just for their needs."
    []
    (or *registry* @def-registry))

And then we can define the simple instrumentation types from this:

  ;;
  ;; Functions to create/locate the different Metrics instruments available
  ;;
 
  (defn meter
    "Function to return a Meter for the registry with the provided tag
    (a String)."
    [tag]
    (if (string? tag)
      (.meter (registry) tag)))
 
  (defn counter
    "Function to return a Counter for the registry with the provided tag
    (a String)."
    [tag]
    (if (string? tag)
      (.counter (registry) tag)))
 
  (defn histogram
    "Function to return a Histogram for the registry with the provided tag
    (a String)."
    [tag]
    (if (string? tag)
      (.histogram (registry) tag)))
 
  (defn timer
    "Function to return a Timer for the registry with the provided tag
    (a String)."
    [tag]
    (if (string? tag)
      (.timer (registry) tag)))

These can then be held in maps or used for any reason at all. They automatically send their data to the local Datadog Agent over UDP so there's no delay to the logger, and since it's on the same box, the likelihood that something will be dropped is very small. It's a wonderful scheme.

But one of the things that's not covered in these metrics is the Gauge. And there's a really good reason for that - the Gauge for Datadog is something that is read from the Datadog Agent, and so has to be held onto by the code so that subsequent calls can be made against it for it's value.

In it's simplest form, the Gauge is just a value that's read by the agent on some interval and sent to the Datadog service. This callback functionality is done with a simple anonymous inner class in Java, but that's hard to do in clojure - or is it?

With Clojure 1.6, we have something that makes this quite easy - reify. If we simply add an import:

  (:import [com.codahale.metrics Gauge])

and then we can write the code to create an instance of Gauge with a custom getValue() method where we can put any clojure code in there we want. Like:

  ;;
  ;; Java functions for the Metrics library (DataDog) so that we can
  ;; constantly monitor the breakdown of the active docs in the system
  ;; by these functions.
  ;;
  (defn cnt-status
    "Function that takes a status value and finds the count of loans
    in the `laggy-counts` response that has that status. This is used
    in all the metrics findings - as it's the exact same code - just
    different status values."
    [s]
    (reify
      Gauge
      (getValue [this]
        (let [sm (first (filter #(= s (:status %)) (laggy-counts)))]
          (parse-int (:count sm))))))
 
  (defn register-breakdown
    "Function to register all the breakdowns of the loan status counts
    with the local Datadog agent to be sent to Datadog for plotting. This
    is a little interesting because Datadog will call *these* functions
    as needed to get the data to send, and we will control the load by
    using memoized functions."
    []
    (.register (met/registry)
      "trident.loan_breakdown.unset"
      (cnt-status nil))
    (.register (met/registry)
      "trident.loan_breakdown.submit_to_agent"
      (cnt-status "Submit to Agent"))
    (.register (met/registry)
      "trident.loan_breakdown.submit_to_lender"
      (cnt-status "Submit to Lender"))
    (.register (met/registry)
      "trident.loan_breakdown.submit_to_lender_approved"
      (cnt-status "Submit to Lender - Agent Approved"))
    (.register (met/registry)
      "trident.loan_breakdown.lender_approved"
      (cnt-status "Lender Approved")))

What I like about this is that I can allow the Datadog Agent to hit this code as often as it wants, and don't have to worry about the freshness of the data - or an excessive loan on the server resources for being hit too much. I can simply memoize the functions I'm using and then control the load on my end. It's very clean, and very nice.

Code Coverage Tools and Clojure

Tuesday, August 18th, 2015

Clojure.jpg

I was asked recently by my manager to look into code coverage tools for clojure because some of the other Senior Devs at The Shop wanted to have code (test) coverage numbers automatically generated for all projects as a part of continuous integration (TeamCity), and then available for all to see and track. I can certainly understand what they are trying to achieve - testable code that allows them to feel comfortable changing the code after the original author is long gone.

With complete test coverage, the theory goes, you can make a change, and then run these tests and prove to yourself that you haven't broken something because you didn't take the time to really understand the codebase.

It's an understanable goal from a management perspective... but even there, I have to think that this has never worked in my experience - and there's no way to have really foolproof tests. Yet today I read a nice post, which contained a quote from Rich H. about testing:

A bad design with a complete test suite is still a bad design. -- Rich H.

And the author, Alex, also verbalizes a lot of the concerns I've had over the years about tests. I can remember adding a feature in 15 mins and then spending several hours updating the unit tests because there were so many that had double-coverage, and yet none could (politically) be removed. Tests, it seems, are a lot like gold bricks - once people get some, they are very reluctant to get rid of any.

Yet they are code, and cost to maintain just like code. You can't think that tests are "free" once they are written... at least not if you're honest with yourself. And to Rich's point, it's better to think first and then attack the problem than it is to believe that tests are everything you need.

It was a very illuminating article. I'm glad a have a link to it now. I'll be sending it to a lot of folks who talk about tests in the future.