Archive for the ‘Coding’ Category

Plenty of Production Problems – Argh!

Monday, November 26th, 2012

bug.gif

This morning has been a really tough one. It started with me checking on the overnight runs while I was still at home (4:00 am), and seeing that they failed due to problems I introduced over the latter part of the week. I really hate that. It was my fault, that's for sure, and it was brought on by an very inconsistent API in CouchRest. No excuse, it was me, and it really bugs the crud out of me when I do that.

No errors, just failed writes to Couch. Argh!

The next really nasty thing was that with the new divisions, I was getting new data, and in that data, we had some bad data, and the optimistic coding that it the hallmark of the ruby debs I know, simply started erring out on nil pointers. Argh!

In the end, I was able to get things re-run and it was OK, but it was a very stressful morning, and there doesn't seem to be a decent payoff for all this stress and work.

It just doesn't seem worth it.

Activated the Write-Back for Production Pilot

Saturday, November 24th, 2012

Come Monday, we have a new pilot to start - even though we really haven't solved any of the scaling issues, we press for more features. It's getting kinda old. But hey, a promise is a promise, and I hope it doesn't bury us.

Thankfully, it's only two divisions, and I added them to the whitelist in the config files, and we should be good to go for Monday. I've got my fingers crossed.

Fixed up the Retry Code and Added Instrumentation

Friday, November 23rd, 2012

bug.gif

I've been having plenty of issues with one of the processes in the application, and I needed to really bolster up this optimistic code with some solid defensive coding - including handling timeouts and putting in some solid New Relic instrumentation to boot. These latter phases of the project have really been glossed over until recently - little to no logging, no instrumentation, no real careful, thoughtful coding.

So I have to go back and do it now.

Ideally, it's be great to see a change in the harts and minds of my team-mates, but I'm not counting on that. I think it's just not in how they seem themselves and their jobs. So it's up to me to do it.

It's not horribly hard, and it keeps me off the streets.

Coding on Thanksgiving – Trying to Get Performance Up

Thursday, November 22nd, 2012

Speed

Well… the addition of the new divisions (added just to meet the crazy deadline) didn't go as well as I'd hoped. Thankfully, I had good New Relic data to look at and see what was happening in the process(es). What it looks like is that there are large sections of code that aren't really taking advantage of the machine, and doing too many things serially. So I set about attacking them.

On Thanksgiving.

First, there was one process that was doing a lot of writing to Couch serially. That was easy enough to fix with a simple java Executor and a couple of threads. I also moved all the single document writes to Couch to bulk stores so that we got much better performance when we had all the data to write up-front.

The next thing was to try adding timeouts to the CouchRest API just to see how that would go. I'm hoping that these REST calls that simply don't return can be trapped in a simple "total timeout" and then retried. As it is now, some of them simply never return, and that's no good at all.

In the end, I had to get the speed up. I'll see how these changes work tonight and make any needed adjustments in the morning.

Working at Home – Just Won’t Miss a Deadline

Wednesday, November 21st, 2012

Bad Idea

I'm working at home on something that I really shouldn't be working on - trying to meet a deadline that I told the project manager we weren't going to meet because we had been having scaling issues, and it just wasn't feasible to get it done by Monday. But here I am… a little bit of spare time, and I'm a sucker for not missing deadlines.

So I'm just going to add in the new divisions and see how it goes. If I have to make adjustments to the code to make it fit in the time allowed, so be it. It should work, and the only question in my mind is do we have the time?

I've got my fingers crossed.

Winding Down for Thanksgiving Break

Tuesday, November 20th, 2012

Today I didn't do a lot - changed my log compactor to look at the modification time as opposed to the access time of the log files - which should make it a lot more consistent from day to day, but not a lot else.

It's really winding down for the long weekend.

I'll be taking my laptop home to work over the five days, just to have something to do. It's so convenient, it's just the extra weight that keeps me from doing it more often.

Interesting RSpec Tips

Monday, November 19th, 2012

Unit Testing

This afternoon I found a set of tests in the code that weren't implemented, and should be. They were stubbed out by one of the guys on the Team, but he didn't have time to really implement them, he just wanted to remind himself that we needed these tests, and so he stubbed them out and then went on about what he needed to do. I noticed them, and decided that since I didn't have a lot going on at this time, it made sense to give it a go at implementing the tests.

After all, I know there's a lot about rspec I don't know, and this would be a nice way to learn about it.

Some of the tests were really pretty clear: make sure the main routine returns something. So how to do that - simply? Well… we can always stub out the methods with simple return values and then just make sure that you get back what you expect.

  require 'lead_assignment/entry_point'
 
  describe LeadAssignment::EntryPoint do
    describe ".unassign_and_assign" do
     before(:each) do
       class FauxRepClient
         def get_reps(division)
           []
         end
       end
       LeadAssignment::EntryPoint.stub(:reps_client => FauxRepClient.new)
       LeadAssignment::EntryPoint.stub(:fetch_accounts => [])
       LeadAssignment::EntryPoint.stub(:add_accounts => [])
     end

on this first test, I noticed that I wanted to start all my tests with this little configuration, so it was easy to put into a before() block and then it was going to be done before each test within the scope of the enclosing describe. That's nice to remember.

Then I can do the end-to-end test:

     it "returns a result" do
       LeadAssignment::EntryPoint.stub(:sink => nil)
       results = LeadAssignment::EntryPoint.unassign_and_reassign('cleveland')
       results.should == { :unassignments => [], :assignments => [] }
     end

and my first test is done!

I learned a lot about testing writing that, and it was going to pay off as I did the others. They all looked about the same - you stub out certain methods, run certain sections, and then check the output. Not bad at all. Just need to be careful and methodical about what you're doing.

Then I came to a more challenging problem: I needed to know when a specific instance was being called, and with a certain set of arguments. That's not too bad - you have to make the instance, and then you can use it:

     it "writes the results to salesforce" do
       class FauxSFClient
         def bulk_store(accounts)
           nil
         end
       end
       my_store = FauxSFStore.new
       LeadAssignment::EntryPoint.stub(:send_assignments_to_sf? => true)
       LeadAssignment::EntryPoint.stub(:store => my_store)
       my_store.should_receive(:bulk_store).with([]).exactly(2).times
 
       LeadAssignment::EntryPoint.unassign_and_reassign('cleveland')
     end

and again, this works great! I like that I can specify the args to the method, and the instance doesn't need to be exactly what's in the code - my faux class is just as good for this as anything.

The final trick I learned has to do with passing blocks to methods and testing the contents of those blocks. You can't actually tell what's in a block, but you can evaluate it and then test that value:

     it "writes log messages properly for summary script" do
       LeadAssignment::EntryPoint.stub(:send_assignments_to_sf? => false)
       LeadAssignment::EntryPoint.stub(:unassign => [])
       LeadAssignment::EntryPoint.stub(:assign => [])
       LeadAssignment::EntryPoint.stub(:sink => nil)
 
       QuantumLead::Application.logger.should_receive(:info) do |method, &block|
         method.should == "LeadAssignment::EntryPoint.unassign_and_reassign"
         block.call.should == "Starting LeadAssignment in cleveland"
       end
 
       LeadAssignment::EntryPoint.unassign_and_reassign('cleveland')
     end

and you can have as many of those logger checking blocks as you believe you will have calls to the logger. It's pretty nice.

With all this in place, I was able to whip up the necessary tests for the code in short order. Pretty nice tools.

Reading More on Clojure

Monday, November 19th, 2012

Clojure.jpg

We are going to be writing a project in Clojure, and I've been told I'm going to be on this project, so I need to get up to speed on Clojure. Thankfully, I had the Pragmatic Programmer's book on Programming Clojure, so I've been re-reading it these past few days.

It's interestingly a lot like Erlang, which I had to learn at a previous position for some work I was doing there. In fact, that's when I picked up the Clojure book, but I also had to pick up the Erlang book, and since there was work to do on that, it took top priority for my time, so I learned it more completely.

Now I'm looking at Clojure, and it's really a lot like Erlang in a bunch of ways. The record data structure, the gen_server ideas, and the functional code all make for a pretty quick learn for me. There's still a lot I need to read - I'm only about one-third of the way through, but it's something that I've been working on these days. Just to be ready.

Decided to Switch to Homebrew

Thursday, November 15th, 2012

Homebrew

I've got an old install of Erlang, and Clojure, and I need to update them for work I'm about to do, but I don't feel like doing the same old installs… I'm going to try Homebrew for package management because it's working so well for my work laptop. So I cleared out the old installs of these packages, which was a chore - basically, complete directories in /usr/local/ or in ~/Library/ and I also took the time to clean up my .bash_login and .bashrc because they had additions for the PATH, and even DYLD_LIBRARY_PATH that needed to be removed as well.

Once I had the old stuff removed, I installed Homebrew with the simple command:

  $ ruby -e "$(curl -fsSkL raw.github.com/mxcl/homebrew/go)"

and it installed itself just fine. Having done this already, I knew what to expect, but the next steps were really nice:

  $ brew install erlang
  $ brew install leiningen

where Leiningen is the Clojure package manager and REPL tool. Once I had this installed, I noticed that /usr/local/bin wasn't early enough in my PATH to make sure that I picked off the Homebrew commands and not the native OS X commands.

Actually, Homebrew itself, pointed this out to me. Nice installer. So I had to track down where this was happening. Interestingly enough, I wasn't adding /usr/local/bin/ to my path - the system was! In /etc/paths there's a list of paths to add:

  /usr/bin
  /bin
  /usr/sbin
  /sbin
  /usr/local/bin

and I needed to change it to:

  /usr/local/bin
  /usr/bin
  /bin
  /usr/sbin
  /sbin

to get things right. Now, I had the PATH right, and both Erlang (erl) and Clojure (lein repl) started up just fine. Sounds like a no-op, but I'm on more recent versions, and for the work I'm about to get into, switching to Leiningen is a must.

But I didn't stop there… Oh no… I kept on cleaning things up. I don't even have Qt on this box, but that was in my PATH, and the Groovy, and a lot of other things that I don't have and don't need. All cleaned up.

By now my .bash_login and .bashrc are looking almost spartan. But then I was wondering about PostgreSQL. Was that on Homebrew? Would it work with Apache2 on my OS X box? Since I had the time, I decided to try it. So once again, I followed the simple steps to migrate from one package to the other:

Step 1 - make a complete backup. I went into my home directory and backed up everything in my server:

  $ /usr/local/pgsql/bin/pg_dumpall -U _postgres -o > pgbackup

Step 2 - shut down the old version, and remove it's startup script from the system-wide install location:

  $ sudo launchctl unload \
      /Library/LaunchDaemons/org.postgresql.postgres.plist
  $ sudo rm /Library/LaunchDaemons/org.postgresql.postgres.plist

Step 3 - remove the old install and all the symlinks in the man pages and the /usr/local/bin directory that I did myself with this install:

  $ cd /usr/local
  $ sudo rm -rf pgsql-9.1

there was some shell magic in the removal of the links - like an ls piped into a grep for 'pgsql' and then removing them. Nothing fancy, but it took a little time.

Now that the old PostgreSQL install was really gone - even from my .bash_login and .bashrc, I was ready to install the PostgreSQL from Homebrew. One of the reasons was that it was 9.2.1 and the previous install was 9.1.

Step 4 - install PostgreSQL:

  $ brew install postgresql

Step 5 - create initial database for Homebrew PostgreSQL install:

  $ initdb /usr/local/var/postgres -E utf8

Step 6 - set it to start on my login, and start it now:

  $ mkdir -p ~/Library/LaunchAgents
  $ cp /usr/local/Cellar/postgresql/9.2.1/homebrew.mxcl.postgresql.plist \
        ~/Library/LaunchAgents/
  $ launchctl load -w ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist

Step 7 - reload databases from initial dump:

  $ psql -d template1 -f ~/pgbackup

At this point, we can run psql and access the databases, and I'm sure I'm up and running, I needed to see about the integration with Apache2 - I have to have that working for some projects I've done, and are still working on.

Step 8 - activating PHP in Apache2 config on my box. Edit the file: /etc/apache2/httpd.conf and uncomment the line that looks like:

  LoadModule php5_module libexec/apache2/libphp5.so

and restart apache:

  $ sudo apachectl graceful

Step 9 - make my ~/Sites directory executable again. Create the file /etc/apache2/users/drbob.conf:

  <Directory "/Users/drbob/Sites/">
    Options FollowSymLinks Indexes MultiViews
    AllowOverride None
    Order allow,deny
    Allow from all
  </Directory>

and at this point, I had the quite familiar PHP info screen up and my simple database accessing page worked like a charm. I'd successfully completed the migration!

But was I done? No!

I've been running boost 1.49.0 for a while, and I like that I figured out how to do universal binaries of the libraries. Very nice. But then I checked Homebrew:

  $ brew info boost
  boost: stable 1.52.0 (bottled), HEAD
  www.boost.org
  Not installed
  github.com/mxcl/homebrew/commits/master/Library/Formula/boost.rb
  ==> Options
  --with-icu
    Build regexp engine with icu support
  --without-python
    Build without Python
  --with-mpi
    Enable MPI support
  --universal
    Build a universal binary

so I could update to boost 1.52.0 and get the same universal binaries without missing a beat! This might be really nice. So I removed my own boost install:

  $ cd /usr/local/include
  $ sudo rm -rf boost
  $ cd /usr/local/lib
  $ sudo rm -rf libboost_*

and then I installed boost from Homebrew:

  $ brew install boost --universal

Odd… I got:

  ...failed updating 22 targets...
  ...skipped 12 targets...
  ...updated 10743 targets...
 
  READ THIS: github.com/mxcl/homebrew/wiki/troubleshooting
 
  These open issues may also help:
    github.com/mxcl/homebrew/issues/14749

The hint was to run brew doctor and correct all the errors. Well… I had a lot of them - all from my manual boost and gfortran installs. So I ditched my old gfortran install and cleaned up all the problems and then I re-ran the install:

  /usr/local/Cellar/boost/1.52.0: 9086 files, 362M, built in 6.1 minutes

When I looked in /usr/local/include and /usr/local/lib I see all the boost code, and I even checked that I got the universal binaries:

  $ file /usr/local/lib/libboost_wave-mt.dylib 
  /usr/local/lib/libboost_wave-mt.dylib: Mach-O universal binary with 2 architectures
  /usr/local/lib/libboost_wave-mt.dylib (for architecture i386): Mach-O dynamically
    linked shared library i386
  /usr/local/lib/libboost_wave-mt.dylib (for architecture x86_64): Mach-O
    64-bit dynamically linked shared library x86_64

Excellent!

Now to put back gfortran from Homebrew:

  $ brew install gfortran

and after cleaning up more cruft from the old gfortran install, it installed and worked just fine!

I have now successfully removed all the third-party builds I once used with Homebrew. This is amazing stuff.

Starting to See a Little Light – Maybe it’s the Train?

Wednesday, November 14th, 2012

Great News

Today I got in early and really started hammering the last few issues for the Pilot starting Monday. I have 10 hours. It's got to be deployed this afternoon and that's it. We have to run in test mode for two days and then turn it on. So it was time to get down to business.

I was able to fix up the last few issues pretty easily, and by 9:00 am I was looking pretty good. People started coming in and things were looking even better, and stand-up went smoothly.

Maybe a little light?

Then I started to polish a little of the code as we needed to clean some stale code out, and clear out a few things with Salesforce - and promote something else. All looking very good. I'm hours away from the deployment for tonight, and things are looking good. I hate to say this - for fear of jinxing myself, but it was looking good.

CouchDB

I then had the time to look at one of the CouchDB problems I've been having: socket errors. What I was doing in the code was writing them one-by-one:

  payloads.each do |data|
    Database.store('Final results') { data[:merchant] }
  end

and while it works, it's making thousands of REST calls to Couch, and that's not efficient at all. There's a bulk store API to Couch in the client we're using, and if I just change the code to be:

  essentials = payloads.map do |data|
    data.select { |k,_| k != :demand_pool }
  end
  Database.store('Final results') { essentials }

then we're making a connection for every 2000 documents and not every one, and all of a sudden, things are under control with Couch!

This is GREAT news! I'm super happy about this. It means we may not have to ditch Couch, and that's nice, but certainly we have plenty of time to switch it out based on what we're doing and needing, and not on some socket problem with the server.

Very nice news!

The day is looking up… maybe the light isn't the oncoming train, after all!