Archive for the ‘Coding’ Category

Optimizing Redis Storage

Sunday, January 19th, 2014

Redis Database

[Note: I created this post for The Shop, and while I'm not sure if it'll ever see the light of day, it's useful, and I wanted to see it posted. So here it is.]

Optimizing Redis Storage

The Optimize Group

One of the tasks of the Optimize Team where I work is to build a real-time analytics engine for our A/B testing framework which involves analyzing the consumers experiencing each of the variants on each experiment. Along with this, we need to look at each deal sold in the system and properly attribute each sale to the experiments the consumer visited on their way to that sale that might have influenced their buying decision. Based on this visiting and buying data, the different product teams can then determine which of the experiment variants they want to keep, and which they don’t.

In order to improve any consumer-facing Groupon product, experiments are done where a random sample of consumers will be placed into a testing group and shown one or more variants of the original, control, experience, and then their responses will be tallied. This A/B testing data will come to our cluster in the form of several separate messages. Some will indicate the consumer, browser, and device when an experiment variant is encountered, others will indicate when a consumer purchased a deal. It is then the job of this cluster to correlate the actions taken by that consumer to see if the variant is better than the control. Did the larger image lead to more purchases? Did the location of the button cause more people to click on it? All these experiments need to be classified and the consumer actions attributed.

Recently, several production systems started using Clojure and given that Storm is written primarily in Clojure, it seemed like a very good fit to the problem of real-time processing of messages. There are several topologies in our cluster - one that unifies the format of the incoming data, another enriches it with quasi-static data, and then a simple topology that counts these events based on the contents of the messages. Currently, we’re processing more than 50,000 messages a second, but with Storm we have the ability to easily scale that up as the load increases. What proved to be a challenge was maintaining the shared state as it could not be stored in any one of the bolts as there are 30 instances of it spread out across five machines in the cluster. So we had to have an external shared state.

All of our boxes are located in our datacenter, and because we’re processing real-time data streams, we’re running on bare metal boxes - not VMs. Our tests showed that if we used the traditional Redis persistence option of the time/update limits, a Redis box in our datacenter with 24 cores and 96 GB of RAM was more than capable of handling the load we had from these 30 bolts. In fact, the CPU usage was hovering around a consistent 15% - of one of the 24 cores. Plenty of headroom.

Redis is primarily a key/value store, with the addition of primitive data types including HASH, LIST, and SET to allow a slightly nested structure and operations to the cache. And while it’s ability to recover after a crash with it’s data intact is a valuable step up over Memcached, it really makes you think about how to store data a useful and efficient layout. The initial structure we chose for Redis was pretty simple. We needed to have a Redis SET of all the experiment names that were active. It turns out that there can be many experiments in the codebase, but only some are active. Others may have completed and just haven’t been removed from the code. To support this active list, we had a single key:

	finch|all-experiments => SET (names)

and then for each active experiment name, we had a series of counts: How many consumer interactions have there been with this experiment? How many errors were there on the page when dealing with an experiment? and even a count for the basic errors encountered in the stream itself - each updated with Redis’ atomic INCR function:

	finch|<name>|counts|experiment => INT
	finch|<name>|counts|errors => INT
	finch|<name>|counts|null-b-cookies => INT

The next step was to keep track of all the experiments seen by all the consumers. As mentioned previously, this includes the browser they were using (Chrome 29.0, IE 9.0, etc.), the channel (a.k.a. line of business) the deal is from (Goods, Getaways, etc.), and the name of the variant they experienced. The consumer is represented by their browser ID:

	finch|<name>|tuples => SET of [<browser>|<channel>|<variant>]
	finch|<name>|variant|<browser>|<channel>|<variant> => SET of browserId

The Redis SET of tuples containing the browser name and version, the channel, and the name of the variant they saw was important so that we didn’t have to scan the key set looking for the SETs of browser IDs. This is very important as Redis is very efficient at selecting a value from the key/value set - but it is horribly inefficient if it has to scan all the keys. While this function exists in the Redis command set, it’s also very clearly indicated as not to be used in a production system because of the performance implications.

Finally, we needed to attribute the sales and who bought them, again based on these tuples:

	finch|<name>|orders|<browser>|<channel>|<variant>|orders => INT
	finch|<name>|orders|<browser>|<channel>|<variant>|qty => INT
	finch|<name>|orders|<browser>|<channel>|<variant>|revenue => FLOAT
	finch|<name>|orders|<browser>|<channel>|<variant>|consumers => SET of uuid

As you can see, the lack of nested structures in Redis means a lot needs to be accomplished by how you name your keys, which makes this all appear to be far more complicated than it really is. And at the same time, we have purposefully chosen to use the atomic Redis operations for incrementing values to keep the performance up. Consequently, this may seem like a lot of data to hold in Redis, but it lead to very fast access to the shared state and Redis’ atomic operations meant that we could have all 30 instances of the bolt hitting the same Redis instance and updating the data concurrently. Performance was high, the analytics derived from this data were able to be generated in roughly 5 sec, so the solution seemed to be working perfectly.

Until we had been collecting data for a few days.

The memory usage on our Redis machine seemed to be constantly climbing. First it passed 20 GB, then 40 GB, and then it crashed the 96 GB machine. The problem stemmed from the fact that while an experiment was active we were be accumulating data for it. While the integers weren’t the problem, this one particular SETs was:

	finch|<name>|variant|<browser>|<channel>|<variant> => SET of browserId

There would, over time, be millions of unique visitors, and with more than a hundred active experiments at any one time, and even multiple browserIDs per consumer. Add it all up, and the Redis SET would have hundreds of millions of entries. This would continue to grow as more visitors came to the site and experience the experiments. What we needed was a much more efficient way to store this data.

Wondering what Redis users do when wanting to optimize storage we did some research and found a blog post by the Engineering group at Instagram. We also found a post on the Redis site that reinforces this point and gives tuning parameters for storing efficiently in a HASH. Armed with this knowledge, we set about refactoring our data structures to see what gains we could get.

Our first change was to pull the ‘counts’ into a HASH. Rather than using:

	INCR finch|<name>|counts|experiment
	INCR finch|<name>|counts|errors
	INCR finch|<name>|counts|null-b-cookies

we switched to:

	HINCR finch|<expr-name>|counts experiment
	HINCR finch|<expr-name>|counts errors
	HINCR finch|<expr-name>|counts null-b-cookies

Clearly, we were not the first to go this route as Redis had the equivalent atomic increment commands for HASH entries. It was a very simple task of breaking up the original key and adding the ‘H’ to the command.

Placing the sales in a HASH (except the SET of consumerIDs as they can’t fit within a HASH), was also just a simple breaking up of the key and using HINCR and HINCRBY. Continuing along these lines we saw we could do a similar refactor and we switched from a SET of browserIDs to a HASH where the keys are the browserIDs - just as unique, and we can use the Redis command HKEYS to get the complete list. Going further, we figured we could that values of the new HASH could contain some of the data that was in other structures:

	finch|<browserID> => app-chan => <browser>|<channel>
	finch|<browserID> => trips|<expr-name>|<name_of_variant> => 0

where that zero was just a dummy value for the HASH key.

With this new structure, we can count the unique browserIDs in an experiment by using the Redis EXIST function to see if we have seen this browserID in the form of the above HASH, and if not, then we can increment the number of unique entries as:

	finch|<expr-name>|tuples => <browser>|<channel>|<name_of_variant> => INT

At the same time we get control over the ever-growing set of browserIDs that was filling up Redis in the first place by not keeping the full history of browserIDs, just the count. We realized we could have the browserID expire on a time period and let it get added back in as consumers return to use Groupon. Therefore, we can use the Redis EXPIRE function on the:

	finch|<browserID>

HASH, and then after some pre-defined period of inactivity, the browserID data would just disappear from Redis. This last set of changes - moving away from a SET to a HASH, counting the visits as opposed to counting the members of a SET, and then EXPIRE-ing the data after a time really made the most significant changes to the storage requirements.

So what have we really done? We had a workable solution to our shared state problem using Redis, but the space required was very large and the cost of keeping it working was going to be a lot more hardware. So we researched a bit, read a bit, and learned about the internals of Redis storage. We then did a significant data refactoring of the information in Redis - careful to keep every feature we needed, and whenever possible, reduce the data retained.

The end effect? The Redis CPU usage doubled, which was still very reasonable - about 33% of one core. The Redis storage dropped to 9 GB - less than 1/10th of the original storage. The latency in loading a complete experiment data set rose slightly - about 10% on average, based on the size and duration of the experiment. Everything we liked about Redis: fast, simple, robust, and persistent, we were able to keep. Our new-found understanding of the internals of Redis has enabled us to make it far more efficient. As with any tool, the more you know about it - including its internal workings, the more you will be able to do with it.

What Would I Build?

Monday, November 25th, 2013

Storm

I've been playing around with Storm for a while now, and while I don't think there are all that many folks in the world that are expert at it, I'm certainly an advanced novice, and that's good enough for the amount of time I've put into it. I've learned a lot about how they have tired to solve the high-performance computing platform in clojure and on the JVM, and I've come away with an affirmation of the feelings I had when I was interviewed for this job, and discussing functional languages: Garbage Collection is the death of all functional languages, and certainly Storm.

I like the simplicity of functional languages with a good library of functions. Face it, Java took off over C++ because C++ was the base language, and Java had the rich object set that everyone built on. It made a huge difference in how fast people could build things. So if you want a functional language to have a lot of traction fast, you need to make sure that you don't send people to re-invent the wheel to do the most basic tasks.

But the real killer is Garbage Collection. I'm not a fan, and the reason is simple - If I'm trying to do some performant coding, I want to control when that happens, and under what conditions. It's nice for novices to be able to forget about this and still write stable code, but when you want to move 1,000,000 msgs/sec, you can't do it without pools, lockless data structures, mutability, and solid resource control. None of which I get in the JVM - or anything based on it.

So what's a coder to do? Answer: Write another.

There used to be Xgrid from Apple, but they dropped that. They didn't see that it was in their best interests to write something that targets their machines as nodes in a compute cluster, and they aren't about to write something where you can use cheap linux boxes and cut them out altogether. Sadly, this is a company, and they want to make money.

But what if we made a library that used something like ZeroMQ for messaging, and then we used something like C++ for the linux side, and Obj-C++ for the Mac side and made all the tools work like they do for Storm - but instead of using clojure and the JVM, and a ton of tools on the server-side to handle all the coordination and messaging, let's use something that's far more coupled with the toolset we're working with.

First, no Thrift. It's bulky, expensive, and it's being used as a simple remote procedure call. There are a lot better alternatives out there when you're using a single language. Stick with a recent version of ZeroMQ and decent bindings - like their C++ ones. Start small and build it up. Make a decent console - Storm is nice here, but there's a lot more that could be done, and the data in the Storm UI is not really easily discernible. Make it clearer.

Maybe I'll get into this... it would certainly keep me off the streets.

Chasing the Magic Tool

Monday, October 28th, 2013

Storm

I'm in the midst of a new project here at The Shop, and I can understand that it's really new technology, and as such, very little is really known about it. Sure, if you listen to the conference talks, Storm is old news, but put it into production and all of a sudden a lot of people's hands lower in the crowd because it's just so bloody new. I'm trying to make it work.

But at the same time, I'm seeing emails about new distributed systems frameworks -- sounds a lot like what Storm is about, and management is asking for opinions. My initial opinion is pretty simple: Pick one and get good at it.

I'm here, but I'm worried that this place is the exact opposite of "Enterprise Tools" - they are the "Always Shiny". We have a tool for distributed, fault-tolerant, computing - so why are we looking at another? Should we assume that the selection we have is premature, and that based on what we have found, we need something better?

I'm not against competition, but then, you have to allow for the fact that you're going to have a hodgepodge of all kinds of systems in the end, as no one goes back and converts a working production system from one working tech to another, different, working tech. There's never time.

So why the search? Why not just get good at one of the leaders in the space, and then gain the critical experience to be able to really make it work?

I fear the answer is that too many people think the tool is the real power.

Nothing could be further from the truth. I've seen it done over and over again - what might be considered antique tech building some of the most amazing things because the people that used it knew it so well they were able to overcome the problems, and make amazing where a newcomer to the tech would see it as impossible.

I hope I'm wrong. I fear I'm not.

Oh, I am SO Guilty of This…

Friday, September 20th, 2013

I just saw this on twitter:

A common fallacy is to assume authors of incomprehensible code will somehow be able to express themselves lucidly and clearly in comments.

— Kevlin Henney (@KevlinHenney) September 20, 2013

…and for the first time in weeks it made me want to post something.

I'm so horribly guilty of this that I don't even realize it. When I look at poorly documented code, I think the author was just lazy - because he's as smart as I am - right? Maybe not.

In fact, probably not.

To this day I don't see myself as any smarter than a lot of the professional developers I have worked with. Sure, there are some really junior folks, but I'm talking about the seasoned professionals - those guys that may have been working in the web space for a while, or working on back-end systems, or library builders… they are all just as smart as I am. The only difference, so I thought, between them and me is that I worked so much harder that it was just a matter of effort.

This little gem of a tweet says in 140 characters what I keep missing over and over again - that when you look at really bad code, it's often times more likely that the author didn't know any better, or was using too much StackOverflow, and really had no idea what they are doing. So adding comments to this mess is only going to increase the line count and not really add value to the work.

I need to remember this more often.

Breaking Production – Not Good Leadership

Friday, August 9th, 2013

Bad Idea

This week has been a little stressful for me - I've spent a few days off work getting the last of my things out of the house and into storage, and then signing some papers to sell the house. It's all a necessary part of life, I know, but it's stressful, and so I have to push through it.

What I didn't expect to have to deal with was a broken production app that supports some of the capabilities of the main project I'm on at The Shop. It's not a lot - it's not really looking all that great, but it's really useful to me in what I'm doing, and I depend on it every morning for writing up a status email that I send to the group about the overnight runs.

Anyway, for two days in a row one of the senior developers in the group - a relative newcomer, has broken production. The first day, I was pretty nice about it - just asking him if he checked production once he deployed the changes, and knowing full well he hadn't. The next day I was not as happy, and it started a significant email chain with him, the group manager, and myself about what we should be doing, and the qualities of leadership, in general.

The problem is that this guy was hired to be the Tech Lead of the group, but he's never really lead in a way that I felt worth following. He could certainly command, but that's not how groups at The Shop are run - it's meant to be a consensus of smart guys arriving at a good decision for the good of the team and business. There will certainly be differences of opinion, and our group has had many, but after a good talking session, we understand everyone's position, and consensus is reached. It might not leave everyone happy about things, but it works.

At least it used to.

Now it's not working, and I've tried to give it several months to work itself out. But after the second day in a row where no testing was done after deploying changes to production, I felt it was time to point out that this casual approach to production has to stop. That it's very simple to test when it's in production, and the lack of even the simplest of testing is really a sign of a much larger problem.

I could try to make light of the real problem, but it boils down to attitude. Always does, doesn't it? If you have to proper attitude about your work, then you care about how it's seen by others. You are careful about changes. You watch the details and make sure they are all covered.

Basically, you do a good job. Regardless of the job. Carpenter, Dentist, Doctor, Coder - all are the same. If you take care in what you are doing, the end result may not be perfect, but it's at least something you can defend as being the very best you can do.

In this case, he knew it was a mistake. And to do it two days in a row was - well… really inexcusable. So I pointed out that leadership is an isolated job - it's up to others if they choose to follow you. Command is an entirely different thing, and I think we have a problem with the words and definitions we're using for this position. He may have been hired as the lead, but it presumed that he was capable of doing that job. For me, at least, he can't.

I don't know what will happen. I doubt if The Shop will re-arrange staff to suit me, but it's possible that I can have my project separated to make it easy to not have to face the daily friction of dealing - or in my case not dealing with him. I hope that's the case, but I don't know that they will do that. If not, it's been clear that there are other groups in The Shop that would be glad to have me help them, so it's not all that bad, but it's uncomfortable now, and I've been able to keep it very professional and positive.

What gets me is that the original members of this group would have laughed a bit at the first day, and then roasted him alive on the second. That we have gotten to this point is very sad to me. I miss the old team.

Letting Go – Regardless of Consequence

Wednesday, July 31st, 2013

cubeLifeView.gif

I like what I do - I really do. I like the company I work for - there are a lot of nice folks here, and I generally like the decisions that management makes. But as with every life a little rain must fall, there are times that your time in a group is done, and it's best to move on. The ideas that shaped the group and got it to this point were necessary and good, but now it's time to let someone else take over and take it from here.

Of course, that's not how it feels.

It feels like the new folks to the business think they have a monopoly on the project even though they just joined the company. It feels like they have no respect for the ideas the project was built on, so that their changes to the codebase make no sense, and in fact are counter to the goals that the project was built on.

It feels like they are being jerks.

And who knows… maybe they are. Maybe they aren't. It's not only impossible to tell, it's also completely unimportant. You find yourself in the minority and it's time to move on. No anger, no grief… maybe a bit of sadness for what's been lost, but loss is part of life. You can't allow the project to be what the new blood wants it to be - sees it to be in their minds, if you're there holding them back.

It's also not really fair to just sit in the group and allow the changes to occur around you. That's just gold bricking. Yeah, you know the code, yeah, you like the project, but it's all going in a different direction and it's time to just cut the cord. Allow the project to be what it will be under their stewardship.

It's time for me to move out of this group. As much as I'd like to keep working on what I'm doing, it's not good for the group or me.

The Amazing Arrogance of Youth

Tuesday, May 28th, 2013

Code Monkeys

It probably shouldn't be surprising to me at all that the event that got me back to writing is arrogance of Code Monkeys. These are the young Rock Stars of the community that think that Ruby is the only language you need to know, and everything else is so much less that it's hardly worth their time. Bash, C++, Python - all "toy" languages to the Code Monkey, as the One True Language is ruby.

Of course, they are pumped up by people telling them they are amazing. They solve a few problems by throwing hardware at it, and they think they are the Oracle at Delphi of everything software, and as long as they can do just a little bit more than the next guy, their confidence and assurance grows. It's kinda sad.

I can remember experiencing several humbling experiences in grad school, and for those I am eternally grateful. I have no desire to ever be like these guys, but I can realize that I probably was back before grad school. But that was a long time before I got out into "The World". By then I knew… I was not all that. I was just a hard worker.

So this came to mind this morning I looked at a failed job in a cron email. I sent the guys responsible for the job an email saying they might want to up the memory for the JRuby JVM, as that was the cause of the failure. The Code Monkey responsible for the job said that he upped it for the run, but didn't change it in the code in the git repo because it wasn't needed.

Now in my mind this isn't possible. If the code is a one-time (a.k.a. throw-away) script, or code, then it's not in the repo as it has no value past it's one use. But if it has value past it's one use, then it's in the repo. If it has value, then it should be fixed for the memory problem. So he's either wrong in putting it in the repo, or wrong for not updating it. I know this, but I try to be nice and ask him "What's the harm in updating it?"

His response I could have guessed: "It's done, there's no reason to update it." Spoken like a true Code Monkey. They hate comments. Why? All ruby code is self-documenting. What they really mean is that everyone should get used to re-reading and re-understanding the code every time they approach it. Documentation just gets in the way of that constant process of self-re-learning. He had no belief that the script would ever be used again, but that's because he's got the time horizon of two weeks. And because he can't see it, it must not be needed. Done.

So I'm going to let it go. He's convinced he's shown me a thing or two. In reality, he's shown his inability to be a tech lead for me. That's his position in the group - Tech Lead. Not a chance in the world. I respect my boss - he's a sharp guy, and I respect his skills. But if he really thinks this guy has serious skills, then he's fooled. But hey… anyone can be fooled - look at me - fooled for 27 years.

Anyway, it's just amazing to me that these Code Monkeys are as plentiful as they are. Sure, I knew plenty of C++ coders that weren't any good. I knew plenty of Java coders that weren't any good, either. But it's the consistency of these Code Monkeys that's really throwing me for a loop.

But you know… maybe that's a good thing. After all, he got me to write something. Maybe I'm making a little progress?

Google Chrome dev 27.0.1423.0 is Out

Wednesday, February 27th, 2013

Well, it looks like the major version just jumped with Google Chrome dev to 27.0.1423.0, and it looks more like a semantic change than a real significant update based on the release notes. Still… it's nice to see that they are still on their schedule of moving things along as they promote from dev to beta to release.

Google Chrome dev 26.0.1410.12 is Out

Friday, February 22nd, 2013

This morning I noticed that Google Chrome dev 26.0.1410.12 is out with what appears to be a good set of updates from the release notes but the minor version update indicates that they see this as minor bug fixes. Interesting take on how they see these changes. In any case, the new maintainer seems to be generating good release notes, and I'm all for that. Maybe the quick succession of bug releases is signaling a shift to 27.*… we'll have to wait and see.

Doing a Lot of Skut Work

Thursday, February 21st, 2013

Code Clean Up

Today has been a lot of skut work - clean-up stuff that has been sitting in the queue for months but no one wants to do. But if the project is going to really work, someone actually has to do it. So since I finished up a lot of tasks today, it seemed like a natural thing to just get to it and clear the decks.

None of this is hard stuff, it's just not very fun, and it takes time.

First off, I followed up with a request for backups to be made of all the database machines we use in the group. This includes CouchDB as well as PostgreSQL. It's nice in that the install of each of these packages places the data files in the largest partition on our boxes: /var/groupon/ so it's simple to just back up that partition. I submitted the request a few days ago, but hadn't heard anything back, so I followed-up asking if I was going to get a completion notice when the backups were working.

Response was: "Yup, likely tomorrow". Good enough.

Next, we needed to get Nagios monitoring of the free disk space on the boxes as well - so that should a process go crazy and start to fill up the disk, we can fix it before it becomes a database killer. This has happened to us on several occasions, and it's something to be avoided as the main processes can't run if the database is offline.

Finally, needed to do what I could to compact the CouchDB databases on the production and UAT hosts because we're at 93% disk space used, and there's very little headroom left. If the compaction of the views doesn't work, then I'm going to just drop the database and start fresh. We have a replicate of the production data, and with the backups (above) we'd be able to go back to it anyway. But this is something I'd rather not do, but it's certainly a sure-fired way to get the space.

It's not glamorous work, but it needs to be done, and no one else is picking it up, so I might as well just do it all and have it done.