Archive for August, 2012

Agile Workflow and Gobs of Stories

Thursday, August 23rd, 2012

Agile Methodology Kool-Aid

Over the course of the last few weeks I've tried very hard to embrace the Agile methodology of writing a bunch of stories for simple, isolated tasks, and putting them into PivotalTracker. This morning is no exception, but I wonder when they'll ask me to stop! After all, I just put in about 10 stories for different CouchDB views and visualizations based on those views, and while it makes perfect sense to have them as individual stories, it's also a lot of stories to wade through.

Lots of guys talk about a "scrollbar tax" with large methods - hence the "ruby way" of having methods with no more than 10 lines in them. But then they have the "scrollbar tax" with dozens of stories in Tracker all basically doing the same thing, but about different data and different views. So it's a bit of a head-scratcher.

Is scrolling bad? Or are you just looking for a reason to have exceptionally tiny methods in your exceptionally tiny classes?

I'm not saying that I prefer 200+ line methods in 1000+ line classes - there's a limit to be sure. But there's also a limit on the low-end as well. I've honestly seen this class in the codebase:

  require 'pipeline'
  require 'pinned'
  require 'app_log'
  require 'json'
 
  class PinnerWorker
    def self.perform(data)
      merchant = data[:merchant]
      otcs = Pinner.pin(merchant)
      Pipeline.notify(self.name, data.merge(otcs: otcs))
    end
  end

Excluding the two lines that define the class, and therefore must be there, the actual functional code is three lines! There are more require statements than that! Sure, I can see why they did it - because there used to be something here, and they didn't want to retrofit all the code if they removed this class. But that's just being lazy.

Anyway… I'm trying to be a Good Citizen and make all the stories and then check them off as I go. It's kinda interesting, but it's amazing how much work there is in the Agile Methodology that has nothing to do with coding. Make these stories, but then you can't use them in the documentation, it's in the tracker. You can use them in the docs - people don't make docs (except me), etc.

I'd be all for a scheme where making something meant that it stayed around!. Then there would be a reason for doing a good job of writing up the "need" initially as it'd be part of the eventual docs for the application of feature.

That would be nice!

Working with CouchDB’s Map/Reduce Framework

Wednesday, August 22nd, 2012

CouchDB

This afternoon I've been doing a lot with CouchDB's map/reduce framework for querying data out of CouchDB. The terminology is pretty simple: a Document can hold multiple Views where each view has a Map component that looks at each document in the database and returns something based on the inspection of it's data, and an optional Reduce function that takes all the results of the Map function calls and reduces it to a smaller dataset.

It's pretty standard in a lot of languages: first you operate on the individual elements in a collection, and then you summarize those values. In CouchDB it's all in Javascript. That's not bad, I've done a lot of that in my day, so it's pretty easy to get back into the swing of things.

One interesting issue is that CouchDB is written in erlang, and while I don't see myself digging into the guts of this thing, it's interesting to know where it all comes from, as it makes it a lot easier to understand why they chose Javascript, for instance.

Anyway, let's say I want to see all the merchants that have no OTCs assigned to them. I'd create a Temporary View in the CouchDB web page, and then in the View Code I'd have something like this:

  function(doc) {
    if (doc.meta.label == "QuantumLead.results" &&
        doc.otcs.length == 0) {
      var key = [doc.division,
                 doc.meta.created];
      var blob = { name: doc.merchant.name,
                   sf_id: doc.merchant.sf_id };
      emit(key, blob);
    }
  }

The interesting parts here are that the emit() method is really the action item in this function. When we want to add something to the output for this Map function, we have to call emit() with the first argument being the key, and the second the value. The key, as shown here, can be a multi-part key, and the value can be any Javascript object.

The thing I like about the use of Javascript here is that the attributes look like "dotted methods" and not hash members. This makes it so much easier to reference the data within a doc by just using the key names and dots. Very nice use of Javascript.

So now that I have my first few Views and Documents in the system, I need to work on getting things out of these calls, and into some nicely formatted output for the important demo that's coming up.

Getting Ready for an Important Demo

Wednesday, August 22nd, 2012

I just got an email from our project manager about a demo he's set up for the COO. The email included a response from the COO about the significance and importance of this project, and how it'll play into the long-term plans for this place. It's pretty scary to think of.

So all of a sudden, I'm feeling that same pressure to perform that I have felt for 16 yrs in Finance. It's the first real demo with this level of visibility since I've joined The Shop, and while it might be ho hum for a lot of the guys, for me, it's the "first impression" this guy is going to have of me and my additions to the team. It's not life-or-death, but it's important, and I want it to go well.

So I'm a little nervous… So many things to get finished and in place for the demo… it's not like we'll have time to run through it before time, it'll be wing-it all the way.

Yikes!

Problems Deploying CouchDB to EC2 Servers

Wednesday, August 22nd, 2012

Amazon EC2 Hosting

This morning Jeff is still having problems getting CouchDB deployed to our Amazon EC2 machines, and it's almost certainly due to the deployment system that's in place in The Shop. It's something I completely understand, but it's also based on the idea that you can't trust anyone. That, and it's an old RedHat-based distro that I know from experience is not as easy to deal with as something like a more recent Ubuntu.

Still, it's just the way it has to be, as that's the only way Prod Ops can deal with things, so there's no real way around it. The problem is that you need to be able to build the code on one box, and package it up - similar to an RPM or a deb package, and then deploy it across a lot of machines. All well and good, but Jeff is having a horrible time getting CouchDB 1.2.0 compiled on his build box.

There are some things he's trying, and even seeing if the other folks around here have any ideas. But the latest attempts have left something that looks like CouchDB running on the server, but when I go to add things to it, I get a nasty stack trace about 'Connection refused' after some kind of timeout. I've inserted about 1500 documents of the 2500 I need to, and it stops.

At the same time, I was able to use Homebrew to simply:

  $ brew install couchdb

and then follow a few instructions about getting it to run on my login startup, and that's it. It Just Works.

I would say that this would also be the case if we were looking at standard Ubuntu boxes in EC2 or Rackspace, and using yum or apt get. The real question is why do we need to do these custom packages for Open Source software when they are so easy to just install?

Again… no way to know… no way to answer. It just is and that's it.

Getting Acquainted with CouchRest

Tuesday, August 21st, 2012

CouchDB

Jeff, the guy recommending CouchDB as a document database for our app, suggested that I look at CouchRest as a nice ruby client for CouchDB. And the docs look impressive - so far as they go. It's pretty easy to open up and use a database:

  require 'couchrest'
 
  @db = CouchRest.database('http://localhost:5984/megafun')

and then saving a document is pretty easy as well:

  @db.save_doc({ one: 1, two: 2 })

even doing a bulk store of multiple documents is easy:

  @db.bulk_save([{ one: 1, two: 2 },
                 { one: 1, three: 3 },
                 { one: 1, four: 4 }])

But the main docs don't really say anything about using a proxy, and in The Shop, with lots of hosts in Amazon's EC2, there's a lot of proxy work that we have to do.

Specifically, to get to Amazon's East datacenter, we have to use a re-directing proxy on our laptops and there was just nothing in the docs about using a proxy, so I had to dig into the code for CouchRest, and thankfully, I've learned a bit of ruby in the last few weeks, and the support was already there!

Because we have servers in EC2 east, I couldn't hard code the proxy usage, but using the pattern we have used for other proxy-based access, I was able to very quickly set up the config files for the CouchDB databases, and then in the code say:

  class Database
    def self.database
      CouchRest.proxy(AppConfig.database.proxy_uri) if AppConfig.database.use_proxy?
      @db ||= CouchRest.database(AppConfig.database.uri)
    end
  end

and then in the rest of the Database class I could just reference this database method.

The things I'm doing have a lot more to do with how we want to organize the data, and not the logistics of the CouchDB itself. We'll have to come up with some standards on the document format to enable the selection, aggregation, etc. that we're going to need in this project. Not a bad start, and it's looking pretty good right now.

I just need Jeff to get the server spun up on one of our EC2 boxes.

Placing my WordPress CodeHighlighterPlus on GitHub

Tuesday, August 21st, 2012

wordpress.gif

This morning I thought I'd spend a few minutes getting my fixes to the existing WordPress plugin - CodeHighlighter, up and into the WordPress site so that I could easily update it, etc. After all, there might be several folks that are looking for something like I wanted, and not finding it in the existing tools. I downloaded the version I'd hacked up on my site, and then placed it into a git repo, added a README.md, and then pushed it up to a new GitHub repo. I was then hoping to simply publish it on the WordPress site, and be done with it.

Silly me… Why should it be that easy?

Turns out, the WordPress plugin site is an SVN repo where you have to give them your code and then they give you access to an SVN repo (Sourceforge?) where you can put your code. A few years ago, I wouldn't have minded, but now… SVN… really?!? Nah… I think I can do just fine without that cruft.

I can simply use GitHub and clone the repo in any WordPress install that I have. There's no need to have anything fancier than that. In any event, I'm guessing that in a little while, the WordPress team will switch to GitHub anyway as the number of SVN users are going to dwindle like Perforce, Visual SourceSafe, PVCS, etc. all have. There's just no way to keep the project looking up to date with SVN.

So it's there, and it's easy to use, you just have to be a little smarter than the average WordPress blogger, but that's OK. I am, and that's all that really matters.

UPDATE: by simply getting into the wp-content/plugins/ directory and doing:

  $ git clone git@github.com:drbobbeaty/CodeHighlighterPlus.git

and then using the WordPress Plugins page, I can disable the old version, enable this new clone, and then delete the old one. After this, everything is OK, and it's all controlled by the GitHub repo.

To be true, this isn't going to auto-update from he WordPress Plugins page, but I didn't have to mess with the SVN repo either - and that's a win for me.

Starting to Use CouchDB

Monday, August 20th, 2012

CouchDB

The decision was made late last week that we really should try to use some document database - like CouchDB for saving our run state for metrics extraction and historical analysis. We had initially planned on using MySQL, as it's the dominant database in The Shop, but that was going to require that we flatten the ruby objects to some set of key/value pairs, or come up with some ORM that would properly store and load things. Neither was really super attractive, and so we had Jeff in the group take a look at CouchDB. I knew MongoDB wasn't the answer, because I'd used it previously, but there were a few nice sounding things CouchDB had that could really tip the scales in it's favor.

Most notably were the views. These are basically the same things that you'd expect in a SQL database, but in the document sense they can be arbitrary map/reduce schemes implemented in javascript, and stored on the CouchDB. This means that we can make some interesting views for the metrics that gather data across different documents and make it presentable in a very simple and efficient way.

I'm thinking that generating the majority of the metrics are possible in this way, and then the stuffing these values into a visualization system shouldn't be too bad. We'll have to see as I'm still in the very early stages of using this, but it certainly has some interesting potential for what we're trying to do.

The Power of Positive Attitude? I’d Like to Think So

Friday, August 17th, 2012

Dorey.jpg

I don't know… maybe positive thinking really does work. This week I had a run-in with someone that was amazingly uninterested in being flexible. The next day, they changed the API on their system. I was starting to believe that this was just going to be status quo when dealing with them, but then today, I received an email and it amazed me.

Their leader totally reversed his position, and was asking me how I wanted the data. This was a real shock as I had never expected it, and was settling myself in for a series of constant changes to the API and fixes to the ETL to keep things working. Very nice to see.

I spent about 5 min thinking about it, and decided that what we originally had was a good plan, and we just needed to keep going. The original plan was to have an array of maps (this is all JSON) where each map represented a possible match and the array was a logical OR. This allows them to change the nature of the individual maps and the logical OR can include a region and a zip code… or a series of zip codes… or regions… all this was vey well thought-out for the geo-tagging.

I wanted that back, and then asked that for the taxonomy of the demand, we do something very similar - an array of maps where each is a tuple of the classification of the demand based on the default taxonomy. This will also be logically OR-ed to get the possible classifications that this demand can fulfill.

In short, I think it's clean and clear, and makes a lot of sense. I hope they accept it.

In any case, I'm shocked that I might have had an effect on the change of heart, but who knows? Maybe I'll step it up with the people on the train at night and see if they get a little nicer!

Upgraded to VoodooPad 5 and Dropbox Syncing

Friday, August 17th, 2012

VoodooPad4.jpg

I decided that I needed to take the plunge and upgrade by VoodooPad Pro 4 to VoodooPad 5 - primarily to support Flying Meat, but also to get the latest features, etc. There are always a few things you want in an upgrade, and this time there certainly were.

Once I got the app downloaded (bought it directly) and the license key installed, I updated my VoodooPad docs very cleanly and easily. No problems there. Then I decided that I'd like to be able to really sync this one Dropbox. So I simply:

  $ cd ~/Dropbox
  $ mkdir Work
  $ cd Work
  $ ln -s ~/Documents/Work\ Info.vpdoc .

and the link will be sufficient for Dropbox to mirror it to all my other machines. On the flip side, I can do the symlinks as well, but I don't really need to - it's just my main box that has the docs in the "original" locations that needs the link.

I'm digging Dropbox for sure, and I can't wait to see how I can make use of it for more things like this.

Going to be Digging into MySQL

Friday, August 17th, 2012

MySQL.jpg

Today I'm going to be spinning up a MySQL instance in Amazon's EC2 for holding the metrics of our application. The goal is to have historical records for detecting trends and A/B testing to see the effects of the changes. It's all about "save it - report it", and I'm looking forward to using MySQL in this case. I've not done a ton with MySQL, and it's a huge deal here at The Shop, so it'll be nice to see it all done right, with good tools, and monitoring as well as good backups.

I've found an interesting OS X client for MySQL - and I'll be digging into it as soon as I get things set up. Or maybe I'll just stick t the command line as I like that on PostgreSQL best. Who knows… it's exciting to be moving in this direction this morning. I just hope I can make some real progress today.