After the significant refactoring of the experiment redis storage, I was monitoring the redis servers on the target box, and noticed that one server was running considerably hotter than the others. Clearly, there was a key in that server that was getting a lot more activity than the others. It didn't take me long to figure out which one it was: it was the redis SET of all the active experiment names.
We are using a SET in redis to list all the unique experiment names so that we can be sure that if we get an experiment message, we can look up it's name in this list and see if indeed, it's active. This is necessary because we have some mobile devices that might be 12 or 18 months out of date, and they may still be thinking experiments are running when they aren't. So we have to filter them when we get them.
There's a simple way to do this in redis:
(defn is-active?
[en]
(if-not (empty? en)
(pos? (scar (car/sismember "finch|all-experiments" en)))))
where we're using carmine for our clojure access to redis. This is a little simplified, but it's stock carmine, and that's the only thing that's important for this illustration.
The problem is that for every experiment message we have to check to see if it's an active experiment. That means 30k to 40k msgs/sec - all going through the function, above. That's wasteful. Why not cache it?
We want to accurately reflect the state of the experiment list, so let's not cache it for that long, but we can hold onto it for 30 sec:
(defn experiment-list*
[]
(set (wcar (car/smembers "finch|all-experiments"))))
(def experiment-list
(memo-ttl experiment-list* 30000))
Now we have a 30 sec cache on the contents of the list, and we've converted it to a set in clojure so that it's easy to now write:
(defn is-active?
[en]
(if-not (empty? en)
(not (nil? ((experiment-list) en)))))
The change was significant - 15% of the CPU usage for that one redis server was gone. Very nice little improvement - and we only have to wait 30 sec to update the experiment list. Good trade-off.