Archive for October, 2014

Adding Exclusions to Storm Topology Processing

Monday, October 6th, 2014

Police.jpg

The Shop now has a group that's continually hitting the different apps and APIs for application security and as a consequence, we're getting a lot of messages that have completely bogus information in them. Like a country field with the value " SLEEP(10) " - as if there's a way to hack the system.

This all makes sense, and I can certainly respect their job, but it does mean that someone needs to filter these out, or we are all going to be dealing with partially corrupted data, and getting wrong answers from it. That someone turned out to be me.

The test wasn't all that hard - we're just looking for a few characters in the field that would be strictly no-good, and then exclude the entire message based on that. The test is really a very simple predicate: a clojure set:

  (def bad-chars "()%\"'")

and then it's used very simply:

  (if (not-any? bad-chars (:field msg))
  )

and we can also use some for the opposite logic, if needed.

There was an additional test - looking at the userAgent to see if it was one of the security group's tests - again, pretty simple, and not too hard to add.

Refactor DDO Loader for Long-Term Usage

Monday, October 6th, 2014

DealPerf Svc

This weekend it was clear that the demand-data files we are loading five times a day were becoming a problem as they were blowing out the 16GB JVM process. When I first wrote the loader, we had files of 1.7 million records, and that fit very nicely in 16GB - with room to spare. Now we're pushing 6 million records a file, and the 16GB isn't able to do the job, and the ensuing garbage collection was eating up more than 10 cores, and bringing the box to it's knees.

Very bad.

So I needed to look at how I was loading the data - and break it up so that the bulk of the work can be done on a separate machine, and the only thing that needs to be done on the database server is the COPY of the CSV file into the table in the database. That's by far the fastest way to load 6 million rows and keep the database online and taking requests all the while.

I realized that it wasn't all that hard - I changed the bulk of the processing to a new box, that was easy, and then I just had to change some crontabs and scripts to have the new locations of the files. Then I simply SCP the file from the processor to the database server, and then use SSH to kick off the loader.

Really not that bad. I still want to walk through a complete cycle, but that shouldn't be too bad.

Adding Hadoop Credentials to Storm Library

Monday, October 6th, 2014

Hadoop

Last Friday, we received the credentials for saving data to the shared Hadoop cluster at The Shop. I wasn't close to my machine, so I couldn't add them easily, so I held off until this morning. It was pretty easy to add using a clojure library I've built here for a previous project. It's not perfect, but it's pretty good, and using the clojure macro partial it was very easy to add the configuration as a 'hidden' field and let the others "Just Work".

Didn't take me very long, but I know a couple of guys needed this on the team, so I wanted to get it done first thing this morning.