SNAP! Got the Speed in Clojure

Clojure.jpg

Well... I just pushed the commits that added the web server to the CryptoQuip repo with my friend. I have been able to get the speed down to that of the Obj-C code, and it was really interesting to me where the hold-up was. I needed to add some logging with timings so that I could get performance numbers for changes as I ran them over and over. But what I found was that the real bottleneck was in the possible? predicate function that was being used to filter the possible words to those that matched the pattern of each cypherword.

What I needed to do was to switch to pmap to make it efficient on getting the matches:

  (defn possible?
    "Function to see if the cyphertext and plaintext have the same pattern of
    characters such that they could possibly match - given the right legend."
    [ct pt]
    (if (and (string? ct) (string? pt) (= (count ct) (count pt)))
      (let [ctc (count (distinct ct))
            ptc (count (distinct pt))]
        (if (= ctc ptc)
          (= (count (distinct (map str ct pt))) ctc)
          false))
      false))

where I now look at the (distinct) length of the cypher and plaintext before making the map of pairs and counting them. This saves a good bit, but the biggie was in breaking out the creation of the possibles sequence:

  (defn match-up
    "Function to create a sequence of all the cypherwords and the possible
    matches to it in a way that we can easily time this for performance
    tuning."
    [quip]
    (let [qw (vec (distinct (.split quip " ")))
          go (fn [cw] (let [poss (filter #(possible? cw %) (get words (count cw)))]
                        { :cyphertext cw
                          :possibles poss
                          :hits (count poss) }))]
      (sort-by :hits < (pmap go qw))))
 
  (log-execution-time! match-up
    {:msg-fn (fn [ret q] (format "%s words filtered" (count ret)))})

Here, the biggie is splitting up the matching using pmap to make sure that we work on this matching problem as fast as possible. It's all still very functional, as the source lists are immutable, and it's a simple filtering problem.

With this, we're in the 65 msec range for the solution, and that's about where the ARC-based Obj-C code was, and that's pretty nice.