Archive for the ‘Coding’ Category

Optimizing SQL for Execution Speed

Wednesday, September 24th, 2014

PostgreSQL.jpg

Today I spent a good bit of my day working on speeding up a few postgres stored procedures (functions) to get the speed up to the point that it's possible to do what we need to do in the time we have allotted. The original implementation was to create a temp table, populate it with the complete history of the time-series, and then filter it out based on the clever use of row_number() and picking just one value in a repeating sequence.

This was fine as long as the time-series didn't get too big. Then the temp tables were crushing us, and the time was insanely long. What really helps is to not make the temp table - but just get everything ready for a single SELECT where sub-queries do the caching for you.

Like I said, it was most of the day, but the difference was more like 50x to 100x difference - well worth the time.

Cloning Redis Servers for a Roll-Out

Monday, September 22nd, 2014

Redis Database

This morning I was getting ready for a roll-out of some new code, and I didn't want to have to re-populate the production cache with a lot of data that I already had in my UAT cache - which has been running for more than a week, and populating data as it goes. The script isn't all that hard - is just runs through all the servers that need to be cloned, stops them, copies over the data file and then restarts the redis server.

  #!/bin/bash
 
  for b in {1..8}; do
    echo "working on redis_${b}..."
 
    echo "  stopping redis${b} server..."
    pid=`top -Unobody -c -b | grep redis_${b}.conf | awk '{ print $1 }'`
    sudo kill -9 $pid
 
    echo "  copying data from caliban over..."
    scp opt-analytics-redis4-uat.snc1:/var/groupon/redis_${b}/dump.rdb .
    sudo cp dump.rdb /var/groupon/redis_${b}/
    sudo chown nobody:nobody /var/groupon/redis_${b}/dump.rdb
    rm dump.rdb
 
    echo "  restarting redis_${b} server..."
    sudo /usr/local/etc/init.d/redis_${b} start
  done

Yeah, it's nothing special, but I'm trying to be a little more diligent on the posts, and this was kinda fun to write, and it works perfectly.

The Latest Atom 0.127.0 is Closer

Tuesday, September 16th, 2014

Atom Editor

I've been looking at Atom as a potential replacement to Sublime Text 3 - my current editor, and it's getting closer, but it's not there yet. They have come up with several new themes - and at least one of them is very nice. They have put in the vertical space option, which was missing for me. They have even put in enough hidden defaults to make it possible to hide the tabs and the left-side bar.

So it's close. Really close.

But it doesn't remember it's state on exit. That's huge for Sublime Text 3, and something I just feel is important enough not to leave behind. I'm expecting the "find anything" that Sublime Text has, and BBEdit doesn't is coming - if it's not there now, but that really doesn't matter to me at this point. Lack of restart state is a killer.

But it's nice to see the GitHub crew aren't stopping with their initial release. And I've read that they are looking to add this back in - which will be nice. But they'll also have to do a lot about the memory footprint - Sublime Text 3 is a lot smaller footprint.

There’s Nothing Worse than Process Without Benefit

Monday, September 8th, 2014

PHB.gif

Today I had my first real run-in with a Process Over Progress at The Shop, and I have to say, this is really the reason that I enjoy leaving jobs. Nothing is a clearer sign of the decay of an organization than to have process seen as an end unto itself, and that things like the form of a document are more important than the content. Where the reviewers of a document can't be bothered to read the document they are reviewing, but still feel perfectly justified sitting in judgement of the author of the document.

I know this is a people-problem, which is exactly why I'm not in management at an organization that I don't own. It's when the people problem rots the management structure to the point that otherwise good managers can't be bothered policing the process monitors, and the lunatics start running the asylum.

There is no solution that I've found - short of leaving the organization, so I guess that's the next step. There is, of course, blatant disrespect of authority, and while that works short-term, it's not something that really is sustainable in the long-run. The trouble-maker can point out the problem, but then has to get back in line or he's a real danger to the authority of those in power.

I'll look back on this post in the future and point to this being a significant turning point in my career at The Shop.

Annoying Feature in Google Chrome

Tuesday, September 2nd, 2014

Google Chrome

I like Google Chrome. I use it everyday for all kinds of uses. But every now and then, they make a really silly change, and I find that it's just amazingly annoying, but there's little I can really do about it. A few versions ago, they added a drop-down in the tab bar for the logged-in username. This used to be a little image in the location bar, but they moved that out, and into the tab bar.

Crazy new feature

I know it's me - and it seems to me that the simple change would be to look and see how many profiles are defined for this guy, and if it's less than two, then don't show a thing - like it used to be.

I must be in such a minority that it's never thought necessary to fix. But it's ugly.

PHP on Heroku

Thursday, May 1st, 2014

php.jpg

I'm looking at standing up an XML-RPC service written in PHP on the web, and I've been doing a lot of looking around, and found that Heroku really is about the best place for us at this point in time. It's got all the scalability we need, it's got Postgres, it's got redis, it's got all the things to mix-and-match all the work I need to do, and it's all in one place. I couldn't think of a better place.

But they haven't been doing PHP... until now.

This post talks about it, and it's really exceptionally simple... Start with a new directory and a simple index.php file:

  <?php
    echo "Hello World!";
  ?>

and in the same directory simply create a blank composer.json file:

  $ touch composer.json

Finally, create the git repo, commit the new files, create the heroku app, and deploy:

  $ git init
  Initialized empty Git repository in ~/hello_heroku_php/.git/
  $ git add .
  $ git commit -m "Initial import of Hello Heroku"
  [master (root-commit) 06ba0a7] Initial import of Hello Heroku
   2 files changed, 5 insertions(+)
   create mode 100644 composer.json
   create mode 100644 index.php
  $ heroku create 
  Creating safe-ridge-5356... done, stack is cedar
  http://safe-ridge-5356.herokuapp.com/ | git@heroku.com:safe-ridge-5356.git
  $ git push heroku master
  ...
  $ heroku open

This is amazingly simple! Just what I needed.

Getting PostgreSQL 9.3 Support into PHP 5.4.24 on Mac OS X 10.9

Monday, April 21st, 2014

PostgreSQL.jpg

A while back I wrote about getting PostgreSQL support into PHP as it was packaged with Mac OS X 10.6. Since then, a lot has happened with Mac OS X, and PostgreSQL, and I find that I'm once again in need of developing PHP and PostgreSQL on my MacBook Pro. So I wanted to refresh this list and make it a little simpler at the same time.

Getting PostgreSQL 9.3

The first big difference in this post is that I'm switching to Homebrew and that's made the entire operation a lot smoother. There's no 32-bit/64-bit issue as Homebrew does both, and it builds in on your box, so again, a lovely customized solution with a simple command:

$ brew install postgresql

It even has a nice, simple informational blurb about how to start/stop and upgrade. Very nice. But now that it's on the box, and ready to roll, let's add in the PostgreSQL support to the PHP 5.4.24 that's installed with Mac OS X 10.9.

Activating PHP in Apache

The first thing to do is to activate PHP in the supplied Apache 2 with OS X 10.9. This is a single line in a single file - /etc/apache2/httpd.conf. There's a line and you need to uncomment it:

  LoadModule php5_module libexec/apache2/libphp5.so

and then add a new file called /etc/apache2/other/php5.conf and have it contain:

  <IfModule php5_module>
    AddType application/x-httpd-php .php
    AddType application/x-httpd-php-source .phps
 
    <IfModule dir_module>
      DirectoryIndex index.html index.php
    </IfModule>
  </IfModule>

which does all the other PHP configuration in a separate file to make upgrades easy.

Building in PostgreSQL extra for PHP

At this point we need to get the source code for the exact version of PHP that Apple ships. This is pretty easy by getting the PHP 5.4.24 source from a mirror and unpacking it into a directory. We then just run the following commands:

  $ cd php-5.4.24/ext/pgsql/
  $ phpize
  $ ./configure
  $ make

at the end of this process, you'll have a file: php-5.4.24/ext/pgsql/.libs/pgsql.so and that needs to be placed in the right directory and referenced in the php.ini file.

For Mac OS X 10.9.2, this is accomplished with:

  $ sudo cp .libs/pgsql.so /usr/lib/php/extensions/no-debug-non-zts-20100525/

and then edit the php.ini file to set the extension directory:

  extension_dir=/usr/lib/php/extensions/no-debug-non-zts-20100525/

and then add the line:

  extension=pgsql.so

Finishing Up

At this point, a simple restart of apache:

  $ sudo apachectl restart

and everything should be in order. Hit a URL that's a static file with the contents:

  <?php
    phpinfo();
  ?>

and you should see all the details about the PHP install - including the PostgreSQL section with the version of Postgres indicated. It's all up and running now.

Bash Safety Tip: Check for Terminal Prompt

Wednesday, February 19th, 2014

Ubuntu Tux

I was having a pretty painful issue with SCP the other day and it took me a few hours to get to the bottom of it, but there's a lesson to be learned here. The problem was that I was able to SSH to a host, but was not able to SCP to that same host. The keys were good, the permissions on the key files were good - and SSH was just fine. It was just SCP.

Finally, I figure out it was my .bashrc file. I had added some functions in there, and they were doing fine with a terminal session, but the SCP "head-less" session was causing it to hang. Horribly. And that's the Safety Tip for the day:

Add this after the alias commands at the top of your .bashrc:

  # skip the rest if we're not interactive
  if [ -z "$PS1" ]; then
      return
  fi

and then you'll have the aliases if you need them, but you won't have the rest that could foul up the SCP session.

Very useful tip. Do it.

Optimizing Redis Storage

Sunday, January 19th, 2014

Redis Database

[Note: I created this post for The Shop, and while I'm not sure if it'll ever see the light of day, it's useful, and I wanted to see it posted. So here it is.]

Optimizing Redis Storage

The Optimize Group

One of the tasks of the Optimize Team where I work is to build a real-time analytics engine for our A/B testing framework which involves analyzing the consumers experiencing each of the variants on each experiment. Along with this, we need to look at each deal sold in the system and properly attribute each sale to the experiments the consumer visited on their way to that sale that might have influenced their buying decision. Based on this visiting and buying data, the different product teams can then determine which of the experiment variants they want to keep, and which they don’t.

In order to improve any consumer-facing Groupon product, experiments are done where a random sample of consumers will be placed into a testing group and shown one or more variants of the original, control, experience, and then their responses will be tallied. This A/B testing data will come to our cluster in the form of several separate messages. Some will indicate the consumer, browser, and device when an experiment variant is encountered, others will indicate when a consumer purchased a deal. It is then the job of this cluster to correlate the actions taken by that consumer to see if the variant is better than the control. Did the larger image lead to more purchases? Did the location of the button cause more people to click on it? All these experiments need to be classified and the consumer actions attributed.

Recently, several production systems started using Clojure and given that Storm is written primarily in Clojure, it seemed like a very good fit to the problem of real-time processing of messages. There are several topologies in our cluster - one that unifies the format of the incoming data, another enriches it with quasi-static data, and then a simple topology that counts these events based on the contents of the messages. Currently, we’re processing more than 50,000 messages a second, but with Storm we have the ability to easily scale that up as the load increases. What proved to be a challenge was maintaining the shared state as it could not be stored in any one of the bolts as there are 30 instances of it spread out across five machines in the cluster. So we had to have an external shared state.

All of our boxes are located in our datacenter, and because we’re processing real-time data streams, we’re running on bare metal boxes - not VMs. Our tests showed that if we used the traditional Redis persistence option of the time/update limits, a Redis box in our datacenter with 24 cores and 96 GB of RAM was more than capable of handling the load we had from these 30 bolts. In fact, the CPU usage was hovering around a consistent 15% - of one of the 24 cores. Plenty of headroom.

Redis is primarily a key/value store, with the addition of primitive data types including HASH, LIST, and SET to allow a slightly nested structure and operations to the cache. And while it’s ability to recover after a crash with it’s data intact is a valuable step up over Memcached, it really makes you think about how to store data a useful and efficient layout. The initial structure we chose for Redis was pretty simple. We needed to have a Redis SET of all the experiment names that were active. It turns out that there can be many experiments in the codebase, but only some are active. Others may have completed and just haven’t been removed from the code. To support this active list, we had a single key:

	finch|all-experiments => SET (names)

and then for each active experiment name, we had a series of counts: How many consumer interactions have there been with this experiment? How many errors were there on the page when dealing with an experiment? and even a count for the basic errors encountered in the stream itself - each updated with Redis’ atomic INCR function:

	finch|<name>|counts|experiment => INT
	finch|<name>|counts|errors => INT
	finch|<name>|counts|null-b-cookies => INT

The next step was to keep track of all the experiments seen by all the consumers. As mentioned previously, this includes the browser they were using (Chrome 29.0, IE 9.0, etc.), the channel (a.k.a. line of business) the deal is from (Goods, Getaways, etc.), and the name of the variant they experienced. The consumer is represented by their browser ID:

	finch|<name>|tuples => SET of [<browser>|<channel>|<variant>]
	finch|<name>|variant|<browser>|<channel>|<variant> => SET of browserId

The Redis SET of tuples containing the browser name and version, the channel, and the name of the variant they saw was important so that we didn’t have to scan the key set looking for the SETs of browser IDs. This is very important as Redis is very efficient at selecting a value from the key/value set - but it is horribly inefficient if it has to scan all the keys. While this function exists in the Redis command set, it’s also very clearly indicated as not to be used in a production system because of the performance implications.

Finally, we needed to attribute the sales and who bought them, again based on these tuples:

	finch|<name>|orders|<browser>|<channel>|<variant>|orders => INT
	finch|<name>|orders|<browser>|<channel>|<variant>|qty => INT
	finch|<name>|orders|<browser>|<channel>|<variant>|revenue => FLOAT
	finch|<name>|orders|<browser>|<channel>|<variant>|consumers => SET of uuid

As you can see, the lack of nested structures in Redis means a lot needs to be accomplished by how you name your keys, which makes this all appear to be far more complicated than it really is. And at the same time, we have purposefully chosen to use the atomic Redis operations for incrementing values to keep the performance up. Consequently, this may seem like a lot of data to hold in Redis, but it lead to very fast access to the shared state and Redis’ atomic operations meant that we could have all 30 instances of the bolt hitting the same Redis instance and updating the data concurrently. Performance was high, the analytics derived from this data were able to be generated in roughly 5 sec, so the solution seemed to be working perfectly.

Until we had been collecting data for a few days.

The memory usage on our Redis machine seemed to be constantly climbing. First it passed 20 GB, then 40 GB, and then it crashed the 96 GB machine. The problem stemmed from the fact that while an experiment was active we were be accumulating data for it. While the integers weren’t the problem, this one particular SETs was:

	finch|<name>|variant|<browser>|<channel>|<variant> => SET of browserId

There would, over time, be millions of unique visitors, and with more than a hundred active experiments at any one time, and even multiple browserIDs per consumer. Add it all up, and the Redis SET would have hundreds of millions of entries. This would continue to grow as more visitors came to the site and experience the experiments. What we needed was a much more efficient way to store this data.

Wondering what Redis users do when wanting to optimize storage we did some research and found a blog post by the Engineering group at Instagram. We also found a post on the Redis site that reinforces this point and gives tuning parameters for storing efficiently in a HASH. Armed with this knowledge, we set about refactoring our data structures to see what gains we could get.

Our first change was to pull the ‘counts’ into a HASH. Rather than using:

	INCR finch|<name>|counts|experiment
	INCR finch|<name>|counts|errors
	INCR finch|<name>|counts|null-b-cookies

we switched to:

	HINCR finch|<expr-name>|counts experiment
	HINCR finch|<expr-name>|counts errors
	HINCR finch|<expr-name>|counts null-b-cookies

Clearly, we were not the first to go this route as Redis had the equivalent atomic increment commands for HASH entries. It was a very simple task of breaking up the original key and adding the ‘H’ to the command.

Placing the sales in a HASH (except the SET of consumerIDs as they can’t fit within a HASH), was also just a simple breaking up of the key and using HINCR and HINCRBY. Continuing along these lines we saw we could do a similar refactor and we switched from a SET of browserIDs to a HASH where the keys are the browserIDs - just as unique, and we can use the Redis command HKEYS to get the complete list. Going further, we figured we could that values of the new HASH could contain some of the data that was in other structures:

	finch|<browserID> => app-chan => <browser>|<channel>
	finch|<browserID> => trips|<expr-name>|<name_of_variant> => 0

where that zero was just a dummy value for the HASH key.

With this new structure, we can count the unique browserIDs in an experiment by using the Redis EXIST function to see if we have seen this browserID in the form of the above HASH, and if not, then we can increment the number of unique entries as:

	finch|<expr-name>|tuples => <browser>|<channel>|<name_of_variant> => INT

At the same time we get control over the ever-growing set of browserIDs that was filling up Redis in the first place by not keeping the full history of browserIDs, just the count. We realized we could have the browserID expire on a time period and let it get added back in as consumers return to use Groupon. Therefore, we can use the Redis EXPIRE function on the:

	finch|<browserID>

HASH, and then after some pre-defined period of inactivity, the browserID data would just disappear from Redis. This last set of changes - moving away from a SET to a HASH, counting the visits as opposed to counting the members of a SET, and then EXPIRE-ing the data after a time really made the most significant changes to the storage requirements.

So what have we really done? We had a workable solution to our shared state problem using Redis, but the space required was very large and the cost of keeping it working was going to be a lot more hardware. So we researched a bit, read a bit, and learned about the internals of Redis storage. We then did a significant data refactoring of the information in Redis - careful to keep every feature we needed, and whenever possible, reduce the data retained.

The end effect? The Redis CPU usage doubled, which was still very reasonable - about 33% of one core. The Redis storage dropped to 9 GB - less than 1/10th of the original storage. The latency in loading a complete experiment data set rose slightly - about 10% on average, based on the size and duration of the experiment. Everything we liked about Redis: fast, simple, robust, and persistent, we were able to keep. Our new-found understanding of the internals of Redis has enabled us to make it far more efficient. As with any tool, the more you know about it - including its internal workings, the more you will be able to do with it.

What Would I Build?

Monday, November 25th, 2013

Storm

I've been playing around with Storm for a while now, and while I don't think there are all that many folks in the world that are expert at it, I'm certainly an advanced novice, and that's good enough for the amount of time I've put into it. I've learned a lot about how they have tired to solve the high-performance computing platform in clojure and on the JVM, and I've come away with an affirmation of the feelings I had when I was interviewed for this job, and discussing functional languages: Garbage Collection is the death of all functional languages, and certainly Storm.

I like the simplicity of functional languages with a good library of functions. Face it, Java took off over C++ because C++ was the base language, and Java had the rich object set that everyone built on. It made a huge difference in how fast people could build things. So if you want a functional language to have a lot of traction fast, you need to make sure that you don't send people to re-invent the wheel to do the most basic tasks.

But the real killer is Garbage Collection. I'm not a fan, and the reason is simple - If I'm trying to do some performant coding, I want to control when that happens, and under what conditions. It's nice for novices to be able to forget about this and still write stable code, but when you want to move 1,000,000 msgs/sec, you can't do it without pools, lockless data structures, mutability, and solid resource control. None of which I get in the JVM - or anything based on it.

So what's a coder to do? Answer: Write another.

There used to be Xgrid from Apple, but they dropped that. They didn't see that it was in their best interests to write something that targets their machines as nodes in a compute cluster, and they aren't about to write something where you can use cheap linux boxes and cut them out altogether. Sadly, this is a company, and they want to make money.

But what if we made a library that used something like ZeroMQ for messaging, and then we used something like C++ for the linux side, and Obj-C++ for the Mac side and made all the tools work like they do for Storm - but instead of using clojure and the JVM, and a ton of tools on the server-side to handle all the coordination and messaging, let's use something that's far more coupled with the toolset we're working with.

First, no Thrift. It's bulky, expensive, and it's being used as a simple remote procedure call. There are a lot better alternatives out there when you're using a single language. Stick with a recent version of ZeroMQ and decent bindings - like their C++ ones. Start small and build it up. Make a decent console - Storm is nice here, but there's a lot more that could be done, and the data in the Storm UI is not really easily discernible. Make it clearer.

Maybe I'll get into this... it would certainly keep me off the streets.