Archive for September, 2012

Refactoring the Web Page for Faster Loading

Monday, September 10th, 2012

WebDevel.jpg

This morning I finished a refactoring that I'd started on Friday when I realized that the UX of the web pages I was making (to be thrown away) were really all wrong. I was doing the more direct scheme - loading a lot of data into memory and then manipulating it quickly to render different graphs. The problem with this is that the data sets are really very large, and the load times for them are longer than you'd want to sit for.

Once of the guys on the team was saying "Break it up into a bunch of small requests." And while I could see that approach, I thought that the overhead of all the calls was going to really kill performance. I never even really considered it.

But about 4:00 pm on Friday, when I was really very frustrated with the code I was working on for that page, I decided to give it a go. What made most sense to me was to break the requests into a few stages:

  • Get the list of all runs - based on the selected database, get the division/timestamp pairs for all runs in that database. We'll be able to parse them next, but this is a nice, short little method.
  • Parse the executions for divisions and timestamps - take the division/timestamp data for all runs in a database and create a data structure where the list of timestamps will be stored in a descending order for a given division.
  • Set HTML run options for a given division - when the user selects a division, take the parsed data and create the drop-down for the runtimes for that guy.
  • Query for the specific data - taking the data from the options - the database, the division, the timestamp, hit CouchDB for the exact data we need to visualize. In many cases this is less than 100 documents.
  • Parse the documents - once we have the targeted data from CouchDB, parse it into Google DataTable or ZingChart series.
  • Render the data - last step.

I was surprised to see that the resulting code was smaller than I'd had. The parsing of the data structures was really a lot more than I thought. Starting at the top of the list, the code to get the list of all runs is really simply:

  function reload_executions() {
    // hit CouchDB for the view of all executions it knowns about
    var svr_opt = document.getElementById('server_opt');
    var url = svr_opt.value + '/_design/general/_view/executions?' +
              opts + '&callback=?';
    $.getJSON(url, function(data) {
      parse_execution_tags(data);
    });
  }

Once again, jQuery really helps me out. Next, I need to parse this data into a structure of all the runs by division:

  function parse_execution_tags(data) {
    divisions = new Array();
    runtimes = new Object();
 
    for(var i in data.rows) {
      // get the execution_tag and exclude the very early ones
      var exec_tag = data.rows[i].key;
      if (!/-\D+$/i.test(exec_tag) || (exec_tag.substring(0,10) < weekAgo)) {
        continue;
      }
      // now get the timestamp and division from the execution_tag
      var runtime = exec_tag.replace(/-\D+/g, '');
      var division = exec_tag.replace(/^.*\.\d\d\d-/g, '');
      if (typeof(runtimes[division]) == 'undefined') {
        runtimes[division] = new Array();
        divisions.push(division);
      }
      runtimes[division].push(runtime);
    }
 
    // sort the divisions and create the contents of the drop-down
    if (divisions.length > 0) {
      divisions.sort();
      var div_opt = document.getElementById('division_opt');
      div_opt.options.length = 0;
      for (var d in divisions) {
        div_opt.options[div_opt.options.length] =
            new Option(divisions[d], divisions[d]);
      }
    }
 
    // given the default division, load up the run times we just parsed
    set_runs_for_division(divisions[0]);
  }

where I'd created the variable weekAgo to be able to let me know what the "recent" data was:

  // get the date a week ago formatted as YYYY-MM-DD
  var when = new Date();
  when.setDate(when.getDate() - 7);
  var weekAgo = when.getFullYear()+'-'
                +('0'+(when.getMonth()+1)).substr(-2,2)+'-'
                +('0'+when.getDate()).substr(-2,2);

Once the data is all parsed into the structures we can then build up the drop down for the runs for a selected division with the function:

  function set_runs_for_division(division) {
    division = (typeof(division) !== 'undefined' ? division :
                document.getElementById('division_opt').value);
    runtimes[division].sort();
    runtimes[division].reverse();
    var run_opt = document.getElementById('run_opt');
    run_opt.options.length = 0;
    for (var i in runtimes[division]) {
      var tag = runtimes[division][i];
      run_opt.options[run_opt.options.length] = new Option(tag, tag);
    }
    // at this point, call back to the the data we need, and then render it
    reload();
  }

Calling to get the actual data is pretty simple:

  function reload() {
    // hit CouchDB for the view we need to process
    var svr_opt = document.getElementById('server_opt');
    var view_opt = document.getElementById('view_opt');
    var run_opt = document.getElementById('run_opt');
    var div_opt = document.getElementById('division_opt');
    var et = run_opt.value + '-' + div_opt.value;
    var url = svr_opt.value + '/' + view_loc + view_opt.value + '?' +
              'startkey=' + JSON.stringify([et,{}]) +
              '&endkey=' + JSON.stringify([et]) + opts + '&callback=?';
    $.getJSON(url, function(data) {
      var tbl = parse_series(data);
      render(tbl);
    });
  }

but parsing it into a Google DataTable is not nearly as simple. The code is complicated by the different requests we need to properly create:

  function parse_series(data) {
    // now put all the data into an object keyed by the execution_tag
    var view_opt = document.getElementById('view_opt');
 
    var table = new google.visualization.DataTable();
    table.addColumn('string', 'Division');
    switch (view_opt.value) {
      case 'merchants_by_existing_merchant':
        table.addColumn('number', 'Deals');
        break;
      case 'merchants_by_research_ranking':
        table.addColumn('number', 'Rank');
        break;
      case 'merchants_by_status':
        table.addColumn('string', 'Status');
        break;
      case 'merchants_by_rep':
        table.addColumn('string', 'Rep SF ID');
        break;
    }
    table.addColumn('string', 'Merchant');
    table.addColumn('number', 'Sales Value');
 
    for(var i in data.rows) {
      var row = data.rows[i];
      var name = (row.value.name.length > 60 ?
                   row.value.name.substring(0,60)+'...' : row.value.name);
      var table_row = new Array();
      table_row.push(row.value.division);
      switch (view_opt.value) {
        case 'merchants_by_existing_merchant':
        case 'merchants_by_research_ranking':
        case 'merchants_by_status':
        case 'merchants_by_rep':
          table_row.push(row.key[1]);
          break;
      }
      table_row.push(name);
      table_row.push(row.value.sales_value);
      table.addRow(table_row);
    }
 
    // now let's apply the formatter to the sales value column
    var fmt = new google.visualization.NumberFormat(sv_format);
    switch (view_opt.value) {
      case 'merchants_by_existing_merchant':
      case 'merchants_by_research_ranking':
      case 'merchants_by_status':
      case 'merchants_by_rep':
        fmt.format(table, 3);
        break;
      default:
        fmt.format(table, 2);
        break;
    }
 
    return table;
  }

but the rendering is very simple:

    function render(tbl) {
      var dest = document.getElementById('table_div');
      var table = new google.visualization.Table(dest);
      table.draw(tbl, table_config);
    }

When I put it all together I was amazed to learn that the hits were exceptionally fast. The page is far more responsive, and in short - I could not possibly have been more wrong. The human lag is sufficient to make the calls invisible, and the sluggishness of the memory load on the old version was horrible. This is a far better solution.

I'm going to remember this for the future.

Spiffy Bash Prompt in Python

Monday, September 10th, 2012

Terminal.gif

This morning a co-worker tweeted that he found this spiffy Bash prompt generator built in Python. Now I'm not normally one to adorn my shells with command prompts like this, but Wow! this is impressive. I mean it's got your name, the path, the git branch you're on, and even history with a red background in case of an error. That's pretty impressive.

Then again, it's probably a serious Python script that takes some time to run, but for those that want a pretty prompt, this looks pretty amazingly stylish. I gotta hand it to him. It's really close to something that's nice enough for me to start using it.

So it goes… I'll keep it in mind for now.

Lots of Web Stuff – And it’s All Going to be Tossed

Friday, September 7th, 2012

WebDevel.jpg

I did it again today. I pushed myself to get a lot of stuff done today for a big, important demo (again), and along the way a few people interrupted me to ask me to look at a few things. Had I been smarter… had I been wiser… I'd have said "I'm sorry, I'm on this push for the demo, how about I get to it on Monday?" But I didn't.

What I did was to push myself to the point that I was very upset with the things I was working on. Oh, it didn't start out that way. It started out with me thinking that I could easily track down this problem that one of the data science guys pointed out. It was in my code for grouping the merchants where I wanted to be smart and clever. I should have known.

First, I wasn't accounting for the case where two groups of overlapping merchants are built and then a single merchant bridges both groups. I messed up. So I needed to go back and fix a few things. First off, I didn't have a general overlapping service method. So I took the old one and expanded it:

  # this method returns true if ANY service is shared between the two
  # merchants. ANY.
  def self.services_overlap?(ying, yang)
    if ying.is_a?(Array)
      # explode the calls for an array in the first position
      ying.each do |d|
        return true if services_overlap?(d, yang)
      end
      return false
    elsif yang.is_a?(Array)
      # explode the calls for an array in the second position
      yang.each do |d|
        return true if services_overlap?(ying, d)
      end
      return false
    end
    # this is the simple call for a 1:1 check
    !(get_services(ying) & get_services(yang)).empty?
  end

Basically, I just allowed arrays to be passed in, and where necessary, I exploded them to allow the basic logic to be applied to the individual merchants. It's not hard, but at this point, I didn't need to worry about what I was passing in to check for an overlap.

The next thing was to completely redo the group_by_service() method as it was far too complex, and it wasn't even working. I didn't like the fact that it was doing a lot of extra checks, etc. but that seemed to be the Ruby Way. Poo. I changed it into a simple single-pass loop that's far simpler and far faster:

  def self.group_by_service(otcs)
    # start with the array of groups that we'll be returning to the caller.
    groups = []
    otcs.each do |d|
      added = []
 
      # add the OTC to all groups that it has some overlap with
      groups.each_with_index do |g, i|
        if services_overlap?(d, g)
          g << d
          added << i
        end
      end
 
      # if we added it to more than one group, then consolidate those groups
      if added.size > 1
        added[1..-1].each do |i|
          groups[added[0]].concat(groups[i][0..-2])
          groups[i] = nil
        end
        groups.compact!
      end
 
      # if he hadn't been added to anything, make a new group for him
      groups << [d] if added.empty?
    end
 
    groups
  end

The ideas here are a lot clearer - add a new guy in to all the group he'd match, then look for multiple matches. If there are, then simply and cleanly consolidate the groups, and continue. My co-worker in Palo Alto liked this code a lot more as well. I do too. It's ruby, it's just not incomprehensible ruby. And it's right.

But in the midst of this, I'm trying to get more web stuff done for another interruption for the demo, and it's nothing I'm happy with. It's all very lame. I don't like the interface I have to CouchDB, I don't have a lot of time, and it's making me very cranky.

I've run into this before, and it's not the first time I've not been able to say "No", and it's cost me something I didn't want to pay. I know I'm no great graphics designer. I know I can make things that work, but they aren't going to be "Wow!" with anyone but someone looking just at the functionality. That's just not in my DNA. And it is exceptionally frustrating to be in a situation where I'm forced to do this work.

I know what is good. I can appreciate it. But I can't generate it. And to be forced to do it is hard. Because I know they are going to throw it all away. Any half-decent designer will look at what I've done and say "Nice, but let's take out the data collecting and put it in a nice design" - as they should.

So I've got to work harder at keeping my cool.

I've got to say "No" more often.

Or I just won't last.

Code Monkeys

Friday, September 7th, 2012

Code Monkeys

I was talking with a good friend this morning and I came up with a name for a lot of the ruby devs I've run into - but to be fair, it's not just for a good chunk of the ruby devs I've met - it's for a general class of developers. Let's pretend to be a little more precise about this:

Code Monkey - a developer that is more interested in learning a language and how to solve a few problems in it, than using it to solve real-world problems. This includes, but is not limited to, the clojure devs that have never written a comment, and only solved the zebra/water puzzle, as well as devs that never code defensively, or even think that production is important.

This came up because I'd been battling code that wasn't written at all defensively. It was basically assumed to have been run by a person, with a person to fix any problems as they occur. It's like a glorified Excel spreadsheet - I'm going to hit 'Go', and fix things that come up.

But this doesn't really work for real life, does it? Who wants a system that runs at night that has to be constantly monitored to make sure it doesn't get bad data, etc.

Yet they are the first ones that are onto a new language - like clojure. Saying that the real solution is to use a language that doesn't need all that checking as a functional language simply doesn't require it.

What world are they living in?

How is a language able to do ETL on it's own? Answer: It can't. You still have to do it. But the Code Monkeys are really skipping all that because they start with good data and then the process is clean, and simple.

No kidding? Really? Well, of course it is! The same is true for C++, Java, and any other language you want to pick. Start with clean data, don't worry about exceptions and potential problems, and you're going to be able to write amazingly clean code. But that's not how life really works.

We agreed that there were guys with language knowledge, and skills, but they never really dug in and made it work. It's nice to talk to Code Monkeys, but it's not nice to have to work with them. You're always cleaning up their messes.

Crazy Tired from Crazy Hard Work

Thursday, September 6th, 2012

cubeLifeView.gif

Once again, we're gearing up for a big demo tomorrow with some of the users, so today has been full of a lot of things that needed to get done in order to have a successful presentation. I had to make quite a few new views for CouchDB, and then work those into a web page and publish everything up to UAT and production for a run. I'm becoming a big fan of CouchDB and it's views and reductions… those are some very powerful tools for looking at this JSON data in CouchDB. Very nice.

I've also done a little fiddling with the Sublime Text 2 styles, and I'm exceptionally happy how that's all turned out. It's made it much nicer to work with. I can't believe I haven't tried it up to this point, and I can't imagine a better editor for me. I'm going to have to brush up on my Python and write some packages some day.

Finally, I'm just dead tired. Great feeling. Lots of really good work done, and the team is really working together far better than I'd have thought a month ago. This is really quite fun.

Hacking on the Sublime Text 2 Syntax Highlighting

Thursday, September 6th, 2012

Sublime Text 2

This morning I was getting tired of the pretty lame syntax highlighting of YAML files in Sublime Text 2 - and I know it could be better. So I started digging. The first thing I looked at was the tmTheme file that I can cloned of the Eiffel theme in the standard release package. It's close… white background, nice colors, but it's not perfect, and I wanted perfect. So here's what I found out.

The matching of the language is really in the tmLanguage files in the packages. These are a bunch of regexs, and it's fine, but each pattern match then pins the color to use to some "classification" - a dotted-notation similar to a Java package. The idea is that if you specify only the first part or parts, then the last parts are up for specialization.

For instance, if you want to have a numeric constant style, it makes sense to build them hierarchically: constant -> numeric -> yaml, this leads to the classification: constant.numeric.yaml. But if you want all constants to be a certain style (by default), you can simply specify the constant style in your tmTheme file.

Alternatively, if you want all your numeric constants to be a certain style except those in java, you make a style for constant.numeric and then a new one for constant.numeric.java. Simple. But certainly not simple to figure out by looking at the files.

So I realized that for YAML, I didn't want the 'Embedded source' to have a colored background. So I added:

  1. <dict>
  2. <key>name</key>
  3. <string>Embedded source</string>
  4. <key>scope</key>
  5. <string>source.php.embedded.block.html, string.unquoted.yaml</string>
  6. <key>settings</key>
  7. <dict>
  8. <key>background</key>
  9. <string>#FFFFFF</string>
  10. </dict>
  11. </dict>

so now it's got a white background. Nice.

The next thing was to notice that I didn't like that the keys in YAML were red like almost all the text (strings, constants, etc.) so I wanted to make those keys blue:

  1. <dict>
  2. <key>name</key>
  3. <string>Markup name of tag</string>
  4. <key>scope</key>
  5. <string>entity.name.tag.yaml</string>
  6. <key>settings</key>
  7. <dict>
  8. <key>fontStyle</key>
  9. <string>bold</string>
  10. <key>foreground</key>
  11. <string>#1C02FF</string>
  12. </dict>
  13. </dict>

and now the keys are a nice blue. Much better!

All this is just in my clone of the Eiffel theme in the Packages/User/ directory in the Application Support for Sublime Text 2. Very nice.

UPDATE: I realized it should be easy to do the same for PHP - which has the annoying background color, and it was! You simply have to look into the tmLanguage file and see the tag name that's used and place it in the string in a simple comma-delimited list. Very slick!

UPDATE: I noticed a few more that I wanted to add - all from the HTML syntax highlighting. The code became:

  1. <dict>
  2. <key>name</key>
  3. <string>Embedded source</string>
  4. <key>scope</key>
  5. <string>
  6. source.php.embedded.block.html,
  7. source.css.embedded.html,
  8. source.js.embedded.html,
  9. source.python.embedded.html,
  10. source.ruby.embedded.html,
  11. string.unquoted.yaml
  12. </string>
  13. <key>settings</key>
  14. <dict>
  15. <key>background</key>
  16. <string>#FFFFFF</string>
  17. </dict>
  18. </dict>

Java 1.6.0_35 Out on Software Updates

Thursday, September 6th, 2012

Software Update

This morning I noticed that Java 1.6.0_35 was updated on Software Updates most likely due to a security issue that's been patched. Given the way Oracle is handling Java, I'm really wishing that Apple would retain control of Java for OS X. Right now, I'm wishing they had it slightly better integrated into the OS such that starting a JVM instance wasn't so time-consuming. Linux handles this with all the shared libs being loaded. Then it's a very lightweight thing to spin up the JVM. On OS X, it's a lot more.

Still, it's nice to see that they have at least one more update. Maybe cooler heads will prevail in the coming months? Most likely not, but a guy can wish, can't he?

Changing Versions of Gems with Bundler

Wednesday, September 5th, 2012

RVM - Ruby's Manager

I had to do a special gem today, and I wanted to get down how it was build and deployed. This is because I didn't remember it and I'd already done this once. So here goes.

First off, make sure that you have the skeleton of a gem - including the gemspec file - it comes with all GitHub repos, and if you make a new gem with the Bundler, it gives you a simple skeleton as well.

Next, write the code and then build it:

  $ rake build

this should deposit it in your pkg directory.

Upload it to a site - like ruby gems. Simple.

It's worth noting that it might help to delete older versions of the gem. This is easily done with:

  $ gem uninstall couchrest

If there are multiple versions of the gem installed, it'll give you a choice. If there's only one, it's gone.

Fix your Gemfile and then reload all the gems with:

  $ bundle install

I’m Loving CouchDB More by the Day

Wednesday, September 5th, 2012

CouchDB

Today I was really battling a nasty problem with the CouchRest client for CouchDB. This is a ruby gem, and in general, it's pretty decent, but it really starts to fall down when you need to use a proxy to get to CouchDB, and this guy starts having all kings of problems.

There were timeout issues, so I decided to try and make it work. So I forked the project on GitHub, and started to get to work. THe major point was that the underlying RestClient gem had the ability to set things like the timeout for creating the connection as well as timeouts for reading, etc. It's really very flexible. My goal was to allow the settings to be applied on a per-database basis. Then, for every command, use these as the defaults, but overlay any call-time options as well.

The idea was really nice. I was planning on submitting a pull request for this as it only took me about an hour to do. But when I went to test it, it failed with some 502 Bad Gateway error.

Argh!

More proxy problems!

Then I was talking to one of the guys in the group about this issue and he brought up that I could write to my local CouchDB, and then replicate it to a different database on a different server!

BING!

This is exactly what I'd been looking for. I get fast and efficient writes to my CouchDB, but it gets written up to the shared server as I'm connected to the network. This is great!

The configuration is simple - it's a simple doc in the _replicator database, and I'm in business. This is really quite amazing. First, go to the overview in the CouchDB web page, and select the _replicator database:

Replicator database

then create a new document:

New Document

Finally, populate it with these essential key/value pairs:

replication doc

  • source - this is the source for replication - it only goes one-way, so this can be a local database, or a remote one. But it's where the data is coming from
  • target - this is the destination for replication - again, local or remote, it makes no difference. Make sure to put in the port and the database name in the URL
  • proxy - if needed, put in the URL of the proxy to get to either of these databases
  • continuous - set to true if you want it to always be replicating

Save this document and then look at the CouchDB status page to see it replicate. It's like magic! OK, not really, but the handling of the proxy is so far superior to the way that CouchRest was dealing with it it's not even funny. This just works!

I'm more and more convinced that CouchDB is an amazing product.

Creating Software Plumbers

Wednesday, September 5th, 2012

I just read this tweet this morning:

Twitter / davehoover: Young people: consider ...

which leads to this article advocating that young people look to entering an apprenticeship program and not continue school. It says, in part:

Universities are the typical place that established businesses expect to find these high-potential beginners. While many software developers finish college with a good education, they’re often burned out, deep in debt, and understandably eager to cash in on their hard work. Apprentices, on the other hand, inject enthusiasm, hard work, and a thirst for knowledge into your teams. They will consistently launch from your apprenticeship program with context, momentum, and loyalty to your organization.

While I can understand the point of the article, and you should read it to get that it's not saying people shouldn't go to higher education, it's saying that you, as a business owner, can capitalize on the cost of higher education, and get those people that might go to college and get them into the workforce.

But is that what we want to have happen, as an industry? I don't think so. I think it's robbing the future to staff the present, and that's a mistake. A big one.

I'm biased. I've got the higher education and the advanced degrees, and I think they are the right thing to do. But even if you discount my position, and do what the author suggests, aren't we just creating a bunch of Software Plumbers? They'll know what they see, and will be able to work with it, but their understanding of how to solve new and unusual problems will be very limited. Oh sure, you'll have a few percent that naturally think outside the box, but their exposure to new things and new ideas will be incredibly limited.

This is the exact purpose of those liberal arts classes for engineers - to broaden a student's horizons. If we just allow people to learn what we want them to learn, aren't we really just forcing ourselves to re-train them when we want to change technologies? Of course we are.

While there are times to have an apprenticeship program - for those that can't make it into college, I think it'll be overused and draw the real future of the profession into one where only a few can really think creatively. And that would be very bad.