Archive for the ‘Coding’ Category

Nothing Like Riding the Beast at Night – At Work

Tuesday, September 18th, 2012

Code Clean Up

I love King's Island's roller coasters, and The Beast is the best. If you ride it at night - right before the park closes it's chilling and thrilling, and no matter how many times I ride it, it's always a new thrill for me. Today was kinda like that - but at work.

So we had 'Launch Day' on Monday, and so as you might expect, there were a bunch of "little things" from the users that needed to be attended to. Certainly understandable, and I've done this enough to know that this is just the way things are… still, like riding The Beast, it's something that certainly gets your heart racing.

First and foremost - it's about time. You have a day to get as many of the "bad" problems fixed, tested and into production as possible. This is because the first impression you gave the users today is going to be supported by the changes you can give them tomorrow. Speed of fixes is crucial in the first days of a project.

Then it's about quality. You can't introduce problems with the fixes, so you have to be careful. This is most often where I have seen a lot of folks stumble. They fix one thing and then break another, and the result is that the users think the development staff is asleep at the wheel. Not good.

So today has been a lot of very fast, very careful work highlighted by the fact that we're really doing the vast majority of the work do deal with crummy sources and sinks of data. I was thinking about this a few days ago, and wondering why this seemingly simple project had so many developers on it and was taking so much time. True, we're ahead of schedule, but that just reinforces my concern.

In Finance, this would have been a few weeks of one or two people - tops. Here's it's much more than that. Why? Well… I came to the realization that about 90% of the code we've written is about dealing with bad sources and sinks of the data. I know the purported benefits of SOA… it's supposed to be great. But that's only true for little things. There's a reason that databases are useful and valuable.

We read a lot of merchant data from an unreliable source. If this were in a database in the data center, then this would be trivial. Likewise for reading the demand data. Updating data would be super easy as we'd have some stored procedure or something to update it.

Yet due to SOA, all this is taking tons of time, and then there's the SSL layer on top due to the fact that a database offers security, but a web service doesn't. It's a mess. But I understand why they are doing it - it's "the way". Still doesn't make it efficient.

I'd love to trade in the sources and sinks for a few databases. The code would be 100x faster, far more reliable, and far less complicated. When I write stuff, I'm not going to let the Kool-Aid get in the way of making a good system.

Launch Day!

Monday, September 17th, 2012

WebDevel.jpg

Well, I didn't realize what today was, so I was a little surprised to see that today was Launch Day for the project I've been working on at The Shop. It's nothing that I'm worried about, but it's been a very busy day considering what's going on.

The day started out with problems in the production runs, but I was in early enough to make sure that things worked out, and then we're off to the races. There were a lot of things still to do, and today I started a few of them and passed off a few to some of the guys in the group.

First, we need to look at the performance. It's horrible. I mean really bad. It's taking an hour to run a division on the production "box" at EC2. Compare that to running it on my MacBook Pro at about 5 min. and the problem is evident. We need to fix the I/O issues. Maybe it's all in the resources of the EC2 machine, but maybe we have what we have and need to change the code around to make it more efficient.

Right now, we need to up the resources, which I did, and let the runs go again and see what the performance differences are. I'm hoping to see significant improvements, but I'm guessing we won't see real improvements until we get the dedicated hardware at the SNC1 datacenter when that arrives. Still… cutting down the hour to 30 mins or so would really be nice.

Second, we need to have a lot better handle on the performance metrics. We have NewRelic wired in, but not nearly enough. We need to make it a lot more interwoven. So we took the time today and added a lot more instrumentation in the code so that come tomorrow we will be able to see a lot more of the breakdown of where the time is being spent.

Third, we needed to fix a bit in the code regarding merchants with closed deals that haven't run yet. In the existing code, these were considered merchants that should be contacted again, but in reality they shouldn't. So I needed to modify the Salesforce endpoint, and then put that to Staging and write the code to pick it up and do the math that was needed. It's not hard, but considering that we're live, it's something that needs to be done today.

So I spent a bit of time this afternoon getting the Salesforce changes in, checked into git and tested and going in UAT. Then I felt with the Salesforce guys to get it pushed to production. I needed it there in order to make sure it was going to run well tonight at midnight. I'm glad to see that it all got done.

Finally, there were a ton of little issues from the initial user feedback. Nothing major, but lots of little things that needed to be cleaned up before tomorrow. It's been a heck of a day!

Hopefully, it's going to work well tonight. But I'll be sure to be in good and early (like usual) to make sure that things are ready for the day.

Working Feverishly to Build a New View

Friday, September 14th, 2012

WebDevel.jpg

Today I've been trying very hard to get a new view done before I have to leave early to catch a bus, to catch a train to get home in time to go to the Car Dealership to pick up the car we got for Marie to drive. It's a 1989 Volvo 740GL with 198,000 miles on it. That's a lot of miles, but I know she could not be happier with the car.

In any case, I needed to get this visualization done, and it meant that I needed to add a new view to the CouchDB as well as making it another drop-down on the page to allow the user to pick the sales rep to view. The basics of this page is to show the user the Top 100 merchants assigned to each sales rep in the division, ordered by the merchant's sales value.

It's not really all that hard, but in keeping with the style I've change to with these pages, it's a lot of little calls, as opposed to a few big calls and then manipulating the data once it's on the client. It's a different style, that's to be sure, but it's workable and I just needed to get the queries done, and the calls set up so that things looked reasonable.

Thankfully, I was able to just get things done before I had to leave.

Whew!

The Death of Imperative Programming

Thursday, September 13th, 2012

I got a note from my manager, a link to this article and the words:

Found something that might stir you up

and sure enough, it did.

I can certainly understand that the rise of the functional languages is going to effect the traditional uses of imperative languages. Certainly for a lot of the uses - I'll even say the majority of cases, performance isn't an issue, and functional languages have the benefit of immutability and therefore a lot less error-prone. While developing in a functional language is vastly difference than an imperative one, there are still reasons to learn the functional approach, and make use of immutable data types - even in imperative languages.

But to think that imperative languages are going to die?

Yeah… No. Not gonna happen anytime soon.

Functional languages - even the most mature ones suffer from the garbage collection problem. If you're going to be creating immutable objects and doing anything with them, you're going to be creating new ones all the time. Kinda goes without saying. And this is where they all suffer.

Java is great, save the GC pauses. Erlang too, with it's smaller GC pauses, but they are still there.

For the rest of my life, to be sure, there are going to be a need for developers that are capable of doing the "hard stuff" - regardless of how many improvements the functional languages make.

Performance is going to matter.

Interesting Sublime Text 2 Package – MavensMate

Thursday, September 13th, 2012

Sublime Text 2

This morning I got a note about a Sublime Text 2 plugin for working with Salesforce.com APEX code as well as Git integration - MavensMate-SublimeText. This is a pretty amazing project for working with Salesforce.com code. While I haven't installed it on my laptop, it looks to be the thing if you're doing any significant amount of Salesforce.com coding. There are pages for running tests as well as immediate mode execution, and then of course, the highlighting and 'Code Assist'.

Very interesting. If I get into some real Salesforce.com code, I'll have to look harder at this.

Code Should be Simple – Not Hidden

Wednesday, September 12th, 2012

GeneralDev.jpg

I was talking to a couple of guys in The Group today and I heard a few guys talk about extracting the logging and timing metrics from the code itself, and have them be simply meta-programmed into Ruby such that all methods of a certain class would be logged and timed. Now I'm all for simplifications - to a point, but this is really, in my mind, going way too far. There's something about minimalism that I think is attractive to the Math types in a group, and we have them, but that's not at all realistic, as code needs to function in the real world, which means that it has to check inputs, log lots of intermediate state, and in general do all those things that rookie coders don't do and it gets them into trouble when their code doesn't perform well in production.

Simple is one thing. Hidden is another.

Don't hide the complexity of logging. It's in your face, and it's meant to be. There's no way some meta-programmed log system is going to know where I want to put every one of the log messages I want. Timings is a little easier, but it's still a mistaken assumption that simply timing method calls is sufficient as I'll never need to sub-divide a method call.

Fiddlesticks.

You need to have logging and timings sprinkled in your code. It's not homework coding, but I'm coming to believe that there's a lot to be said for the C++/Java world that I come from. Ye,s it's not Ruby, and there are a lot of things to like in Ruby, but there's a lot that I think these guys take for granted and don't properly write code to guard against.

Make code simple, yes. But if you hide too much, then you'll forget what's really being done, and you're going to get hit in the rear pretty soon. Performance is a huge blind spot for the majority of these guys. They just don't see it, and don't see it as needed. Couldn't be more wrong. It's always important in a production system.

So I'm going to try and guide them away from this decision, as I am a firm believer that it's going to get them, and me, into hot water. We just don't need it.

Ruby include vs. extend

Wednesday, September 12th, 2012

Ruby

I learned a valuable lesson about Ruby this morning - if you have a module of shared code in Ruby:

  module SharedStuff
    def log
      puts "log"
    end
  end

then:

  • include makes the module's methods available to an instance of the class
  • extend makes the methods available to the class itself

This means that by using the same base code, you can add them in as class methods with the extend directive, and as instance methods with the include directive.

Totally 'Ruby', as it's a strange, massive difference that might be missed by new users of the language.

Glad I Learned it.

Google Chrome dev 23.0.1262.0 is Out

Tuesday, September 11th, 2012

This morning I noticed that again, Google Chrome dev is bumped to 23.0.1262.0 with some more good release notes. There's an update to WebKit (537.10), and the V8 javascript engine (3.13.6.0), and at least one Mac-specific fix. Nice! The page refresh speed is really quite amazing, and has been for the last two releases. It's really impressive. I'm hoping they keep it up!

Refactoring the Web Page for Faster Loading

Monday, September 10th, 2012

WebDevel.jpg

This morning I finished a refactoring that I'd started on Friday when I realized that the UX of the web pages I was making (to be thrown away) were really all wrong. I was doing the more direct scheme - loading a lot of data into memory and then manipulating it quickly to render different graphs. The problem with this is that the data sets are really very large, and the load times for them are longer than you'd want to sit for.

Once of the guys on the team was saying "Break it up into a bunch of small requests." And while I could see that approach, I thought that the overhead of all the calls was going to really kill performance. I never even really considered it.

But about 4:00 pm on Friday, when I was really very frustrated with the code I was working on for that page, I decided to give it a go. What made most sense to me was to break the requests into a few stages:

  • Get the list of all runs - based on the selected database, get the division/timestamp pairs for all runs in that database. We'll be able to parse them next, but this is a nice, short little method.
  • Parse the executions for divisions and timestamps - take the division/timestamp data for all runs in a database and create a data structure where the list of timestamps will be stored in a descending order for a given division.
  • Set HTML run options for a given division - when the user selects a division, take the parsed data and create the drop-down for the runtimes for that guy.
  • Query for the specific data - taking the data from the options - the database, the division, the timestamp, hit CouchDB for the exact data we need to visualize. In many cases this is less than 100 documents.
  • Parse the documents - once we have the targeted data from CouchDB, parse it into Google DataTable or ZingChart series.
  • Render the data - last step.

I was surprised to see that the resulting code was smaller than I'd had. The parsing of the data structures was really a lot more than I thought. Starting at the top of the list, the code to get the list of all runs is really simply:

  function reload_executions() {
    // hit CouchDB for the view of all executions it knowns about
    var svr_opt = document.getElementById('server_opt');
    var url = svr_opt.value + '/_design/general/_view/executions?' +
              opts + '&callback=?';
    $.getJSON(url, function(data) {
      parse_execution_tags(data);
    });
  }

Once again, jQuery really helps me out. Next, I need to parse this data into a structure of all the runs by division:

  function parse_execution_tags(data) {
    divisions = new Array();
    runtimes = new Object();
 
    for(var i in data.rows) {
      // get the execution_tag and exclude the very early ones
      var exec_tag = data.rows[i].key;
      if (!/-\D+$/i.test(exec_tag) || (exec_tag.substring(0,10) < weekAgo)) {
        continue;
      }
      // now get the timestamp and division from the execution_tag
      var runtime = exec_tag.replace(/-\D+/g, '');
      var division = exec_tag.replace(/^.*\.\d\d\d-/g, '');
      if (typeof(runtimes[division]) == 'undefined') {
        runtimes[division] = new Array();
        divisions.push(division);
      }
      runtimes[division].push(runtime);
    }
 
    // sort the divisions and create the contents of the drop-down
    if (divisions.length > 0) {
      divisions.sort();
      var div_opt = document.getElementById('division_opt');
      div_opt.options.length = 0;
      for (var d in divisions) {
        div_opt.options[div_opt.options.length] =
            new Option(divisions[d], divisions[d]);
      }
    }
 
    // given the default division, load up the run times we just parsed
    set_runs_for_division(divisions[0]);
  }

where I'd created the variable weekAgo to be able to let me know what the "recent" data was:

  // get the date a week ago formatted as YYYY-MM-DD
  var when = new Date();
  when.setDate(when.getDate() - 7);
  var weekAgo = when.getFullYear()+'-'
                +('0'+(when.getMonth()+1)).substr(-2,2)+'-'
                +('0'+when.getDate()).substr(-2,2);

Once the data is all parsed into the structures we can then build up the drop down for the runs for a selected division with the function:

  function set_runs_for_division(division) {
    division = (typeof(division) !== 'undefined' ? division :
                document.getElementById('division_opt').value);
    runtimes[division].sort();
    runtimes[division].reverse();
    var run_opt = document.getElementById('run_opt');
    run_opt.options.length = 0;
    for (var i in runtimes[division]) {
      var tag = runtimes[division][i];
      run_opt.options[run_opt.options.length] = new Option(tag, tag);
    }
    // at this point, call back to the the data we need, and then render it
    reload();
  }

Calling to get the actual data is pretty simple:

  function reload() {
    // hit CouchDB for the view we need to process
    var svr_opt = document.getElementById('server_opt');
    var view_opt = document.getElementById('view_opt');
    var run_opt = document.getElementById('run_opt');
    var div_opt = document.getElementById('division_opt');
    var et = run_opt.value + '-' + div_opt.value;
    var url = svr_opt.value + '/' + view_loc + view_opt.value + '?' +
              'startkey=' + JSON.stringify([et,{}]) +
              '&endkey=' + JSON.stringify([et]) + opts + '&callback=?';
    $.getJSON(url, function(data) {
      var tbl = parse_series(data);
      render(tbl);
    });
  }

but parsing it into a Google DataTable is not nearly as simple. The code is complicated by the different requests we need to properly create:

  function parse_series(data) {
    // now put all the data into an object keyed by the execution_tag
    var view_opt = document.getElementById('view_opt');
 
    var table = new google.visualization.DataTable();
    table.addColumn('string', 'Division');
    switch (view_opt.value) {
      case 'merchants_by_existing_merchant':
        table.addColumn('number', 'Deals');
        break;
      case 'merchants_by_research_ranking':
        table.addColumn('number', 'Rank');
        break;
      case 'merchants_by_status':
        table.addColumn('string', 'Status');
        break;
      case 'merchants_by_rep':
        table.addColumn('string', 'Rep SF ID');
        break;
    }
    table.addColumn('string', 'Merchant');
    table.addColumn('number', 'Sales Value');
 
    for(var i in data.rows) {
      var row = data.rows[i];
      var name = (row.value.name.length > 60 ?
                   row.value.name.substring(0,60)+'...' : row.value.name);
      var table_row = new Array();
      table_row.push(row.value.division);
      switch (view_opt.value) {
        case 'merchants_by_existing_merchant':
        case 'merchants_by_research_ranking':
        case 'merchants_by_status':
        case 'merchants_by_rep':
          table_row.push(row.key[1]);
          break;
      }
      table_row.push(name);
      table_row.push(row.value.sales_value);
      table.addRow(table_row);
    }
 
    // now let's apply the formatter to the sales value column
    var fmt = new google.visualization.NumberFormat(sv_format);
    switch (view_opt.value) {
      case 'merchants_by_existing_merchant':
      case 'merchants_by_research_ranking':
      case 'merchants_by_status':
      case 'merchants_by_rep':
        fmt.format(table, 3);
        break;
      default:
        fmt.format(table, 2);
        break;
    }
 
    return table;
  }

but the rendering is very simple:

    function render(tbl) {
      var dest = document.getElementById('table_div');
      var table = new google.visualization.Table(dest);
      table.draw(tbl, table_config);
    }

When I put it all together I was amazed to learn that the hits were exceptionally fast. The page is far more responsive, and in short - I could not possibly have been more wrong. The human lag is sufficient to make the calls invisible, and the sluggishness of the memory load on the old version was horrible. This is a far better solution.

I'm going to remember this for the future.

Spiffy Bash Prompt in Python

Monday, September 10th, 2012

Terminal.gif

This morning a co-worker tweeted that he found this spiffy Bash prompt generator built in Python. Now I'm not normally one to adorn my shells with command prompts like this, but Wow! this is impressive. I mean it's got your name, the path, the git branch you're on, and even history with a red background in case of an error. That's pretty impressive.

Then again, it's probably a serious Python script that takes some time to run, but for those that want a pretty prompt, this looks pretty amazingly stylish. I gotta hand it to him. It's really close to something that's nice enough for me to start using it.

So it goes… I'll keep it in mind for now.