It’s Hard to Beat Clojure for Complex Systems

Clojure.jpg

I'm doing a lot of performance work on the Deal Performance Service this morning - trying to handle these 6 million row files and imports into Postgres, and I am constantly struck by the fantastic smile I get on my face when working with clojure in this environment. It's simple, compact, expressive, and actually very readable to me. And I'm a huge fan of extensive comments.

Being able to take a serial process like:

  (doseq [l (take limit (line-sew rdr))
          :let [raw (line-to-ddo l stamp deals taxy)]
          :when raw
          :let [old (get eod (:ddo_key raw))
                ddl (if-not eod raw (merge-daily cfg stamp raw old))]
          :when ddl
          :let [row (gen-csv-row ddl all-fields)]
    (spit csv-name row :append true)))))

and with almost no effort, turn it into a parallel one:

  (let [do-it (fn [l] (if-let [raw (line-to-ddo l stamp deals taxy)]
                        (let [old (get eod (:ddo_key raw))]
                          (if-let [ddo (if-not eod
                                         raw
                                         (merge-daily cfg stamp raw old))]
                            (gen-csv-row ddo all-fields)))))]
    (doseq [row (pmap do-it (take limit (line-sew rdr))]
      (spit csv-name row :append true)))

I simply have to make the body of the doseq into a simple function and then use pmap as the source for the new doseq, and I'm in business. This has made the refactoring of the code so much simpler. It's easy to re-work the code over and over to get the optimal data flow.

And then there's the memoization... Sure, you can cache in any language, and it's not all that hard. But again, the ease with which it's added after the fact to a clojure function is really why it's so powerful. You can start out with nothing cached and see what needs to be cached after you start doing performance tests. This makes the refactoring so much easier than predicting if caching is going to be needed in any case, and then try to make a system of functions or classes ready to move that way, should it be necessary.

I've done it in C++ - and it's not horrible, but it means that you have classes for looking everything up, and then within each class, the implementation is either with - or without - caching. It can all be done, but it complicates everything because now there's a class for loading everything.

I'm sure there are tons of other languages that folks like. Heck, I like Obj-C and C++, but I have to look at what clojure is capable of creating and facilitating, and have to marvel at it's design. Really quite impressive.