Archive for September, 2014

There’s So Much to Learn… I’m Just Stunned

Friday, September 12th, 2014

Storm Logo

Sure, I'd like to learn to be a really good designer, but I know there might not be enough time in my life to get good enough at it to justify the investment. But there are also a lot of things I'd like to get up to speed on - closure's core.async, Swift, and a lot of things like that. I would love to be able to learn, and then apply these tools. It sounds like a lot of fun.

But there's a different class of still - not a skill like design, not another tool, but the understanding of the interdependent parts of a complex system. For example, I'm doing more topology tests this morning and I'm seeing behavior in the relationships between the bolts that I simply would never have guessed. It's stunning, and it makes me smile.

There's no class for this. There's no Tutorial for this. This is you, the machine, and time. This is testing how well you understand this deterministic machine you've built, and looking at the input, can you understand why the behavior is what it is?

It's almost detective work.

What a blast.

More Topology Balancing

Thursday, September 11th, 2014

Unified Click

I've spent most of the day trying to get the topology working under load from the batch email send process. In truth, I may never get it really perfectly balanced because it's a batch process and not a real-time, on-demand, kind of thing. I'm trying to count shotgun pellets after the gun goes off. It's kinda tough.

But still I try. And I'm learning a lot about the way this topology and Storm is responding to the load. For instance, if you don't want to buffer tuples in the system - and for the most part, I don't, then use the :local-or-shuffle and then make sure that your data flow is balanced on all bolts before that step. This will save a lot of lag in the throughput as it can hand off one tuple to the next bolt without going through any buffering.

What I've been playing with lately is significantly increasing the size of the decorator bolt parallelization hint and the encoder and transmitter bolts to see if this will make a difference, or if it's just going to shorten the time we're at capacity by moving more messages through the system - but still always being at capacity.

So I've had good luck, actually, and this is the message rate for a bulk email send (purple) and the corresponding output decorator (golden) and output messages (cyan):

Message Rate for 500 PH

There's a lot to like about this graph over the older ones - first, the decorate and xmit are virtually identical - i.e. no buffering. Excellent. Also, the drop-off on the output is nearly as good as the ramp-up, so that means that we're really doing a pretty decent job of moving the data. I'm not unhappy with this graph at all. But the capacity graph is a different story:

Message Rate for 500 PH

Here we see that we peaked after the email send block was done, and that's a bit odd, but on the plus side, the encode and emit bolts also rose nicely saying that the decoding is starting to share the load more, and that's a good thing.

My concern is still the capacity number. I suppose I'll run a few more tests with higher numbers still and see if that makes any difference, but I have a feeling it's not going to make any change to the height of the capacity surge - but it will likely lessen the duration.

Testing New Data Feeds for a Topology

Wednesday, September 10th, 2014

Storm Logo

Today has been a easy-going day of performance testing and topology tuning with the new data source I'm trying to integrate - email sends. This is coming in from a current batch process that's delivering batches of emails to different locations all around the globe. The trick with this is that testing it is kinda tough because if they aren't running a batch, you have no data to test with. Combine that with the fact that a lot of these tests with Storm need to run for a while to get to the steady-state condition, and it makes for a lot of staring at graphs.

I like to make a table of all experiments and their results, and for today's experiments this is what I ended up with:

RM Decoders RM Mappers Decorate Topology Workers rms-decode rms-map decorate Time
375 150 350 25 0.023 0.025 0.740 2:51:18
250 0.001 0.002 0.650 1:28:42
rewrite lookups 250 0.033 0.004 0.527 31:41
100 50 250 0.000 0.000 0.594 20:26
300 30 0.000 0.000 0.438 15:16

While it's still developing, it's clear to me that we started out with more resources on the email send bolts than we needed, and re-writing the lookups was an important step.

UPDATE: with the increase of the workers to 30, we finally have something that handles the load at least as well as production. That's good enough for today.

Really Pleased with Sharded Redis Solution

Wednesday, September 10th, 2014

Redis Database

I've been faced with a significant problem - cache data for 110,000,000 users in a way that's both fast, and efficient, so that we can pull data out of it at a rate of 50k to 100k times a second. Redis, being single-threaded, is great at fast hits for reasonable data sets, but storing more than 100 million of anything, and accessing it by hundreds of threads is going to blow out any one redis server - so you have to shard.

But how to shard efficiently?

Turns out, Java's MD5 is amazingly efficient. I wrote the following hash function that takes any string and hashes it into one of 8 buckets - I'm planning on having 8 redis servers on one physical box:

  (defn hash8
    "Function to generate a 3-bit hash on the provided string by using the MD5
    as an intermediate vechile, and then taking the last byte and 'mod 8' it.
    This is using the Java MessageDigest class to create the hash, so it's
    only as good as that class/function - but from our tests, it's very efficient."
    [s]
    (if (string? s)
      (let [ba (.digest (doto (java.security.MessageDigest/getInstance "MD5")
                              (.reset)
                              (.update (.getBytes s))))]
        (mod (aget ba (dec (alength ba))) 8))))

and I compared it to what I thought was going to be the much faster way: Simply adding up all the ASCII byte values and taking the last 3 bits:

  (defn smash
    "Function to generate a 3-bit hash on the provided string by summing
    the bytes of the string, and then taking the 'mod 8' of it."
    [s]
    (if (string? s)
      (let [ba (.getBytes s)]
        (mod (areduce ba i ret (long 0)
               (+ ret (aget ba i)))
             8))))

And to my surprise, when I ran a sequence of strings through both, the MD5 version far outperformed the byte array version, and I'm now convinced that it's because of the getInstance() call - Java is holding onto a generator, and serving it up to the caller as needed. Plus, they have to have really optimized that code to beat a simple adder.

In the end, I put this on the front-end of the redis calls with:

  (defn user-shard
    "Function to return the right function that will be used with 'wcar' to get
    the provided key from redis. This is specific just to the user data."
    [k]
    (case (hash8 k)
      0 :user-ids-1
      1 :user-ids-2
      2 :user-ids-3
      3 :user-ids-4
      4 :user-ids-5
      5 :user-ids-6
      6 :user-ids-7
      7 :user-ids-8))

and then it's used, with Carmine, as:

  (wcar (user-shard k) (car/get k))

When I look at the CPU and memory usage on the box, I'm seeing wonderfully balanced CPU usage - meaning that the sharing is very nicely distributed, and the memory usage for redis is very reasonable for the data set.

Great win!

Trying Very Hard to be Supportive to Teammates

Tuesday, September 9th, 2014

cubeLifeView.gif

I want to be supportive of my teammates, I really do. But today it's been exceptionally hard. I guess it started off yesterday with a multi-hour meeting where the manager wanted to ditch the old apps the group had built - even though one was just finished - and make one, better, unified tool. I think it's a good idea. But that's predicated on the fact that if the same people make the new tool that just tried to finish the one we're throwing away, it's likely not to be any better, and will need to be thrown away as soon as it's finished as well.

I like the people in my group - I do. They are decent people. But some of them have no business doing this job. No more so than I'd have being a doctor. I don't have the skills, and no amount of support and encouragement from managers and team members would make it so. They don't have the skills.

Case in point - this multi-hour meeting. Our manager suggested that we start fresh with a very simple web app - Rails, in his suggestion was fine. I wouldn't have chosen Rails only because I'm not that familiar with it's weaknesses, and I prefer to have a more separable data store and codebase, but that's me. I have no doubt this can be done in Rails, and done well.

But not my the guys we have.

They argued that if we're going to be using Rails, then we should stick with the old app that the manager wants to throw away. He's too nice to say exactly why he wants it thrown away - and only part of them are because it's over-designed, and a horrible mess. It's also because to get anything done in it takes a week, and the speed of improvements make a glacier look like an Olympic sprinter. It's a joke.

But it's the devil they know, and rather than admit that, they want to move to Node.js, as that's something they know better. In truth, I don't know that Node.js is bad, or good - or how it interacts with databases, etc. But that's not the point I'm trying to make - it's the acceptance of the lack of skill, and the effort required to gain that skill. They don't see it as a problem.

So I tried to help guide a few of their decisions today, and it was soon clear that they really had no business being involved at this level, and they were totally unaware of this fact. So I had to just shut up. That's the kindest I can be to these guys. Let them make their own mistakes, and maybe they will learn. If not, then the manager will learn, and if not him, then his manager - until someone has the good sense to start moving people around and clearing out the dead weight.

I miss that from Finance.

Struggling with Dead Workers in Carmine

Tuesday, September 9th, 2014

Redis Database

This morning I'm once again trying to figure out a problem I've been having with the workers in the Carmine message queue implementation. Basically, the thread that starts the workers is doing just fine, but the workers themselves, are just stopping. I had no idea what to do about it - so I wrote to the author asking him about this.

He responded with the :monitor option to the worker function. I didn't see it in reading the code, but yes, there's a function that gets called when the queue is cycled, so I added a simple function there to reset an atom, and then in the thread that starts these workers, I inc that atom, and if it exceeds 50 sec of not being reset, then I know that it's taken more than 50 sec for the queue to cycle, and I try to stop/start the worker.

The basic monitor and it's worker look something like this:

  save-mon (fn [{:keys [mid-circle-size ndry-runs poll-reply]}]
             (debug "persistence worker heartbeat (iteration)...")
             (reset! _save_loops 0))
  saver (mq/worker (epr/connection :queue) *dump*
          {:handler (fn [{:keys [message attempt]}]
                      (save-it! cfg message)
                      {:status :success})
           :monitor save-mon
           :nthreads 1})

and then in the main body of the function we have something that checks to see if the _save_loops atom has been reset recently enough:

  (let [sc (swap! _save_loops inc)]
    (when (< 50 sc)
      (warnf "Persistence worker hasn't cycled for %s sec -- Restarting!" sc)
      (reset! _save_loops 0)
      (infof "Stopping the persistence worker... [%s]"
             (if (mq/stop saver) "ok" "FAIL"))
      (infof "Starting the persistence worker... [%s]"
             (if (mq/start saver) "ok" "FAIL"))))

This all took a while to figure out, but after a time, I got it working, and it appeared to be working. But the stopping and starting just weren't doing the right things. Add to this, the background that in the other data center, I had multiple installations of this where the workers weren't in trouble at all.

I'm starting to think it's the redis server. That will likely be the next thing I do - restart the redis server and hope that clears any odd state that might be there.

I do wish this would settle down.

UPDATE: I emailed Peter, the author, and he asked me to check the logs - and in this case the timbre logs - his logging package. These go to standard out, and I had forgotten about the. Sure enough, there was useful data there, and all the stops and starts were logged as well. At this point, I believe it's something in redis, and it has been successfully cleared out. But if it happens again, I'll dump the redis database and start fresh - it's just the queue data.

There’s Nothing Worse than Process Without Benefit

Monday, September 8th, 2014

PHB.gif

Today I had my first real run-in with a Process Over Progress at The Shop, and I have to say, this is really the reason that I enjoy leaving jobs. Nothing is a clearer sign of the decay of an organization than to have process seen as an end unto itself, and that things like the form of a document are more important than the content. Where the reviewers of a document can't be bothered to read the document they are reviewing, but still feel perfectly justified sitting in judgement of the author of the document.

I know this is a people-problem, which is exactly why I'm not in management at an organization that I don't own. It's when the people problem rots the management structure to the point that otherwise good managers can't be bothered policing the process monitors, and the lunatics start running the asylum.

There is no solution that I've found - short of leaving the organization, so I guess that's the next step. There is, of course, blatant disrespect of authority, and while that works short-term, it's not something that really is sustainable in the long-run. The trouble-maker can point out the problem, but then has to get back in line or he's a real danger to the authority of those in power.

I'll look back on this post in the future and point to this being a significant turning point in my career at The Shop.

Getting Bolt Metrics from Nimbus

Monday, September 8th, 2014

Storm Logo

This morning I wanted to be able to add the Storm topology bolt capacity values to the in-house monitoring and graphing tools that The Shop uses. The reason for this is that I'm constantly checking the Storm UI to see what the capacity values are for the bolts on my critical topology, and it'd be so much nicer to be able to see them in a simple graph on the display that I'm already looking at for disk space, CPU usage, and also the higher-level metrics like the messages per second emitted from each bolt.

The latter is something I figured out by digging into the Nimbus JavaDocs, and it was still useful in this bit of detective work. But the biggie was the code that the Storm UI uses to generate it's response. That was a little harder to find, but when I found it, I knew I had what I needed to get the job done.

The resulting code wasn't too bad:

What really surprised me was that even the Storm UI was written exactly as I would have done it - all in clojure with compojure for the RESTful API. It was pretty sparsely documented, but in the end, I can understand an Apache project with sparse documentation - it's kinda to be expected.

I fiddled around with the return values and the difference between a StormTopology and a TopologyInfo - and how it's used in both the emitted counts and the calculation of the capacity for the bolts. But in the end, by looking carefully at the code as an example, I was able to get what I needed out of the library. Very nice.

Topology Capacity

Superb Chalk Holders

Saturday, September 6th, 2014

Great tools of the trade.

Many years ago I had a pair of chalk holders that were really amazing. They were like mechanical pencils for chalk - fantastic! But that was many years ago, and I lost track of them. Well, when I got the chalkboard for the house, I knew I wanted to get something like those again... but had a tough time finding them. I got something like it - aluminum, and it seemed to be like the ones I had, but these had a real problem - the chalk slipped in the holder - some of the time.

I lived with this for a while, but today I had had enough of that, and I wanted to try something else. So I picked up a pair of these guys, and what an improvement! These guys have a significant heft to them, and most importantly, the chalk doesn't slip - ever! What a treat.

Getting closer to where I want to be... one tiny step at a time.

Adding Email Send Messages

Friday, September 5th, 2014

Unified Click

Today I got news that the email send log messages were now including the deal UUID so that I could generate the daily_send messages in the unified message stream. This is something we've been waiting a long time for, and as I dug into this, I realized very quickly that there are a number of really significant issues in adding these messages:

  • There's a whole lot of them - like 10k msgs/sec just for the sends, and each send has anywhere from 6 to 12 deals - which translates to upwards of 100k msgs/sec just for the sends. That's a lot.
  • There's no user identifier - there's a UserId, but it's not really what I need because it's coverage is not really complete - missing in 3.7 million messages in 4 hours today, and it's only useful for US and Canada. We need to move to the UUID identifier that's good for international locations as well.

So I looked into ways to solve this, and it's possible, but it's not easy. I could map all the email addresses to UUIDs for the users, but then that's a map of 110 million - plus - emails. That's a lot. And keeping it fresh is going to be a challenge due to the size and the frequency of additions/changes.

And this is only for the sends. The email opens are a different stream, and that was going to require that we cache the data from the sends to know what to send for an open. Thankfully, there was a ticket in the system already to add the deal UUIDs to the email open message, and I created a ticket to either get better coverage on the UserId and/or add the user UUID to the sends and opens so that we can use the data in the message(s) and not have to have any lookups.

I have no idea how long this will take, but until I get some support from them, I really can't go a lot further. Nearly a million malformed messages (missing the UserId) every hour is not a good way to establish a dataset. So things have to get better upstream.

Hope so.