This morning I was getting ready for a roll-out of some new code, and I didn't want to have to re-populate the production cache with a lot of data that I already had in my UAT cache - which has been running for more than a week, and populating data as it goes. The script isn't all that hard - is just runs through all the servers that need to be cloned, stops them, copies over the data file and then restarts the redis server.
#!/bin/bashfor b in{1..8}; doecho"working on redis_${b}..."echo" stopping redis${b} server..."pid=`top -Unobody-c-b|grep redis_${b}.conf |awk'{ print $1 }'`sudokill-9$pidecho" copying data from caliban over..."scp opt-analytics-redis4-uat.snc1:/var/groupon/redis_${b}/dump.rdb .
sudocp dump.rdb /var/groupon/redis_${b}/sudochown nobody:nobody /var/groupon/redis_${b}/dump.rdb
rm dump.rdb
echo" restarting redis_${b} server..."sudo/usr/local/etc/init.d/redis_${b} start
done
Yeah, it's nothing special, but I'm trying to be a little more diligent on the posts, and this was kinda fun to write, and it works perfectly.
This morning I didn't have anything special to do, so I made two new endpoints for an existing service that I've got for one of my projects. They were really specced out as independent services, but as I looked at them, I knew each were a dozen lines of clojure - tops, and with the compojure library, I could easily add the routes to the existing service, and get everything I needed with a minimal level of effort.
The endpoint was all about adding a historical retrieval to the analytics project that I've been working on. Because I've already been storing these experiment reports in a database, it was easy to write a simple function:
(defn load-nearest-report
"Function to load the _nearest_ copy of the Panopticon-formatted
report from the historical storage for the provided experiment name.
This is for those times that you want the report for an experiment
at a specific time."[expr-name gen](let[row (dis/query
["select id
from finch_experiments
where experiment=?
and generated_at < ?
order by generated_at desc
limit 1" expr-name (to-timestamp gen)]
:result-set-fnfirst)
exp-id (:id row)](if exp-id (load-report exp-id))))
Now I've used HoneySQL, and it's nice, but I've found that for me - it's often just plain faster for me to write SQL as that's what I think in. For others, there are a lot of tools, but for me, this is about as easy as it gets - write a SQL statement - add in the arguments, set a result set function, and then load the report (code we already had).
again, we had a lot of the functions written as part of the underlying support library, or as part of the analytics library - which is the beauty of the clojure style - make it all functions, make them simple and composable, and then it's easy to use them over and over again.
The second service is really a server-side mimicking of the internals of a client library that is at the core of The Shop's A/B testing suite. The idea is to be able to reliably, and quickly, decide if a user should be exposed to the control, or one of the experiment variants. And then make sure that this follows them time after time so that the experience is consistent.
The code for this is a little bigger, but it's because we're parsing the config data and mapping the persistent UUID into a bucket, etc.:
This is probably more convoluted than it needs to be, but the structure of the experiment configuration data isn't really great, but it's not bad. Still... this works and gives us a beautiful endpoint to use.
It's amazing what you can build in a morning with clojure and some good tools.
I like problems - no doubt about it. I like solving lots of problems, and I like delivering a really good product. It's just every now and then that it goes way too far. Like today... I got a packed field in a message, and rather than fix the upstream data source to not pack two data elements into one field with an uncertain delimiter, management decided that it was my job to fix it.
Not because it's right... but because I can.
It's happened so many time before, and in the end, everyone is happy because I don't make the upstream providers fix their broken stuff, and I don't make the downstream folks live with crappy data. I guess I'm the universal filter - and only good things come from me.
In a way, I guess it's a compliment, but it's really just hard to deal with some days. Like it's not enough that I do a great job - I have to be perfect - for everyone - or I'm sitting down on the job.
This morning I had an interesting challenge - a You Viewed history service I'd built was not working as they'd hoped because it wasn't filtering out bad information. Now to be fair, it never has, and wasn't designed to... but they wanted it, and it wasn't all that hard - at least I didn't think so.
The task at hand was pretty simple - fix the insertion topology to not insert the bad data, and then on the API service - fix the extraction so that the historical data that was put in already would be filtered out on the pull. Still... sounds pretty reasonable.
But it's always in the details, isn't it?
Turns out that the insertion topology was using carmine 2.6.2 - and the pulling API was using carmine 2.3.1 - they weren't compatible. Figuring this out was "Step 1", and then fixing it in short order was the trick. The problem wasn't just a simple update - it was that the connection pooling changed from 2.3.1 to 2.6.2, and I had to re-write the connection pooling for the API. It wasn't hard, but doing it while the production systems are hurting is not good.
In the end, I got it all up and it's working fine. Whew!
After the significant refactoring of the experiment redis storage, I was monitoring the redis servers on the target box, and noticed that one server was running considerably hotter than the others. Clearly, there was a key in that server that was getting a lot more activity than the others. It didn't take me long to figure out which one it was: it was the redis SET of all the active experiment names.
We are using a SET in redis to list all the unique experiment names so that we can be sure that if we get an experiment message, we can look up it's name in this list and see if indeed, it's active. This is necessary because we have some mobile devices that might be 12 or 18 months out of date, and they may still be thinking experiments are running when they aren't. So we have to filter them when we get them.
where we're using carmine for our clojure access to redis. This is a little simplified, but it's stock carmine, and that's the only thing that's important for this illustration.
The problem is that for every experiment message we have to check to see if it's an active experiment. That means 30k to 40k msgs/sec - all going through the function, above. That's wasteful. Why not cache it?
We want to accurately reflect the state of the experiment list, so let's not cache it for that long, but we can hold onto it for 30 sec:
The change was significant - 15% of the CPU usage for that one redis server was gone. Very nice little improvement - and we only have to wait 30 sec to update the experiment list. Good trade-off.
Today has been a very interesting day due to a speedy refactoring of a topology and analysis library to shard redis across four instances - up from one. This was brought about by a change we tried to deploy to the cluster, and then rolling it back only to see that we were so close to capacity on the single redis server that it wasn't ever going to work.
Redis is great - it's really quite good, but it's single-threaded, and that means that you need to have some scheme of sharding the data across instances if you want to keep pushing the throughput of the system up. I've found myself doing this a lot in the last month - we're adding more messages, and the single redis instances just aren't enough. So we have sharded several of the systems. It's turned out to be exceptionally efficient as well.
So now it was the experiment analysis that had to be sharded. This was different than the others in that it wasn't a single place that redis was accessed - it was all over the one namespace that did the data collecting. It meant that operations that used to be a single function had to be broken up because the keys might not shard to the same instance, and it was important to make sure that everything ended up in the redis instance that it was expected to be in.
Interestingly enough, the complete refactoring was a complete success - first try. No need for edits or changes. It Just Worked. Very nice experience.
Now that I found that I had a bug in the caching, I wanted to go back and look at the topology capacity numbers, because they looked a lot different now with the caching fixed, than it did with the useless caching. So I started looking at the numbers, and it was clear that we were backing up at the Kafka transmitter bolt, and that was applying back-pressure to the JSON encoding bolt, and that was putting back-pressure on the decorate bolt where we had been putting all the work into to fix up the caching for speed.
Given this, and that we're not going to expand the kafka cluster at this time, it made sense to back off on the size of the workers, and bolt parallelization hints, and see if it made any difference. After all, we can't push out messages faster than kafka is capable of taking them.
I went from 50 to 40 workers, and saw similar numbers:
and the capacity:
Then when I went to 30, I lost the email sends for the day and I'll have to let it sit for a day to try again tomorrow. Still... it's very promising, and I'm very glad that the caching is fixed.
This afternoon I was working on adding more sharding to the caching I was doing with the redis servers - this time a 4-way shard based on the same sharding solution I had done just a few days ago. The modification was simple:
(defn deal-shard
"Function to return the right function that will be
used with 'wcar' to get the provided key from redis.
This is specific just to the deal data"[k](case (hash8 k)(01) :deal-ids-1(23) :deal-ids-2(45) :deal-ids-3(67) :deal-ids-4))
and this was going to make it faster and easier to handle the load of the email sends we were getting. This was because I was a little worried that we were overloading the single redis instance for the deal data.
When I was putting this in the code I realized that the user caching I had set up was a complete sham! I had changed the keys to be much smaller, but only on the look-ups, and not on the writes! This means that every single hit was a cache miss, and the back-end database was required for every augmentation.
What a mess. I needed to fix the storage keys, let the old ones expire, and then populate it all again. With this in place, we are seeing a lot better response from the decorate bolt. Much better performance from the cache when it's actually used.
I think I'm getting to the limits of this cluster with the topology I'm trying to make - and the source data I have to work with. I've moved up to a parallelization hint on the choke bolt to the point that I believe we're just sloshing from one part of the topology to another and back to the first. There just aren't enough resources to do all this, and at the number of workers I'm using, that's not at all surprising.
Let's look at the message rates again:
We are moving a lot of messages, and the email sends are running, then stopping, and then starting back up again. It's a good test of a significant load.
Now let's look at the capacity of the bolts:
Here we see the transmit jump up after the decoration, and then fall off while the decoration stays high. Then it's back and the decoration goes higher... then off, then on... When we compare this to the last set - from yesterday, we see a more "rate limited" view of the system. We appear to just be robbing Peter to pay Paul - not a great thing.
This just gives us an enormous depth of data about the interactions of these components in a large-scale system. I think it's wonderful and fascinating, and it makes me smile.
UPDATE: I went back to the parallelization hint of 500 for the decorate bolt, and for the corresponding transmitter bolts and the graphs look very nice:
Sure, I'd like to learn to be a really good designer, but I know there might not be enough time in my life to get good enough at it to justify the investment. But there are also a lot of things I'd like to get up to speed on - closure's core.async, Swift, and a lot of things like that. I would love to be able to learn, and then apply these tools. It sounds like a lot of fun.
But there's a different class of still - not a skill like design, not another tool, but the understanding of the interdependent parts of a complex system. For example, I'm doing more topology tests this morning and I'm seeing behavior in the relationships between the bolts that I simply would never have guessed. It's stunning, and it makes me smile.
There's no class for this. There's no Tutorial for this. This is you, the machine, and time. This is testing how well you understand this deterministic machine you've built, and looking at the input, can you understand why the behavior is what it is?