Really Pleased with Sharded Redis Solution

Redis Database

I've been faced with a significant problem - cache data for 110,000,000 users in a way that's both fast, and efficient, so that we can pull data out of it at a rate of 50k to 100k times a second. Redis, being single-threaded, is great at fast hits for reasonable data sets, but storing more than 100 million of anything, and accessing it by hundreds of threads is going to blow out any one redis server - so you have to shard.

But how to shard efficiently?

Turns out, Java's MD5 is amazingly efficient. I wrote the following hash function that takes any string and hashes it into one of 8 buckets - I'm planning on having 8 redis servers on one physical box:

  (defn hash8
    "Function to generate a 3-bit hash on the provided string by using the MD5
    as an intermediate vechile, and then taking the last byte and 'mod 8' it.
    This is using the Java MessageDigest class to create the hash, so it's
    only as good as that class/function - but from our tests, it's very efficient."
    [s]
    (if (string? s)
      (let [ba (.digest (doto (java.security.MessageDigest/getInstance "MD5")
                              (.reset)
                              (.update (.getBytes s))))]
        (mod (aget ba (dec (alength ba))) 8))))

and I compared it to what I thought was going to be the much faster way: Simply adding up all the ASCII byte values and taking the last 3 bits:

  (defn smash
    "Function to generate a 3-bit hash on the provided string by summing
    the bytes of the string, and then taking the 'mod 8' of it."
    [s]
    (if (string? s)
      (let [ba (.getBytes s)]
        (mod (areduce ba i ret (long 0)
               (+ ret (aget ba i)))
             8))))

And to my surprise, when I ran a sequence of strings through both, the MD5 version far outperformed the byte array version, and I'm now convinced that it's because of the getInstance() call - Java is holding onto a generator, and serving it up to the caller as needed. Plus, they have to have really optimized that code to beat a simple adder.

In the end, I put this on the front-end of the redis calls with:

  (defn user-shard
    "Function to return the right function that will be used with 'wcar' to get
    the provided key from redis. This is specific just to the user data."
    [k]
    (case (hash8 k)
      0 :user-ids-1
      1 :user-ids-2
      2 :user-ids-3
      3 :user-ids-4
      4 :user-ids-5
      5 :user-ids-6
      6 :user-ids-7
      7 :user-ids-8))

and then it's used, with Carmine, as:

  (wcar (user-shard k) (car/get k))

When I look at the CPU and memory usage on the box, I'm seeing wonderfully balanced CPU usage - meaning that the sharing is very nicely distributed, and the memory usage for redis is very reasonable for the data set.

Great win!