Archive for the ‘Coding’ Category

Perspective – I needed a little…

Thursday, September 27th, 2012

This morning I've been fighting off a few things and while I'm doing better at being able to handle the slings and arrows these days, I was given a wonderful reminder from twitter:

100 years from now

In 100 years I won't be here. My kids won't either. Their kids? Probably, but that's only if they live well. There may not be a soul on this planet that remembers what I'm doing in this life, and that's OK.

Who was working in the machine shop (the high-tech equivalent of today) in GE's plant 100 years ago? No idea. He worked hard, tried to raise a family, be a good husband, and some did well, others not so much. But today, they are long forgotten.

Carpe Diem. 'Nuff said.

Leading People to See the Bigger Picture

Wednesday, September 26th, 2012

cubeLifeView.gif

I love it when I'm able to work with people that see the same Big Picture as I do. It doesn't happen often, but when it does, it's almost magical. The next best thing is to work with someone that can see some Big Picture - even if it's different from mine, and then we can hash out the differences and come to some accord with how to get to that endpoint.

Some of the most frustrating people to work with are those that simply are incapable of seeing the Big Picture. Maybe they don't think in those abstract terms. Maybe they don't think there is such a thing. Maybe they aren't looking at what's being done as much as how it's done. For whatever reason, I'm in the midst of trying to make someone see the Big Picture, and it's something that before too long, I'm just going to give up on.

This isn't a critique of the person… it's the old adage:

Don't try to teach a pig to sing - it only frustrates you and annoys the pig.

if a person isn't going to see what you want them to see - for whatever reason, then it's time to just stop and let them be the person they want to be. If they have the capability of seeing it, and just don't want to, then maybe, someday, they'll change their mind and come to you seeking out your advice then.

If they can't see it at all, you're annoying them, and if at some point in the future, they want to see try again, they will again seek you out, and try again.

But until then, it's just a problem - for you and for them. Better to accept them as they are and move on. No amount of cajoling, pleading, arguing is going to make an adult change their mind. They have to come to that decision themselves.

So I'm trying to convince myself that the right time to let go is now.

Right now.

Loads of Production Problems with Salesforce

Wednesday, September 26th, 2012

bug.gif

This morning I spent all morning struggling with some production issues. The runs didn't complete, and I had to dig into the logs to find out why. Here, again, the way a lot of the Ruby devs function really hurts maintenance. This optimistic coding is something I've fought for a great number of years, and it seems that it's really systemic, or maybe endemic to the industry. People want to think "This works… and if it doesn't then it's not my fault". This might be true, but that doesn't make it right.

So first thing was figuring out what was wrong with the data. It seemed to be a data problem, so that's where I started digging. Pretty soon, I realized that the source of the data - Salesforce.com, wasn't returning the data - saying that the HTTP GET was invalid, but a POST was acceptable. I looked at the code, saw where we were doing GETs and figured out that we had the ability to do POSTs as well - changed them, retried, and still no good.

Got onto Campfire to explain the situation and try to find help. Clearly, something with Salesforce.com changed overnight and it was now no longer accepting the calls that were working yesterday.

After a lot of failed attempts, I was finally able to convince myself that there was nothing wrong with our code - that it was Salesforce.com that was simply refusing the API calls we had made yesterday. I was able to confirm this with one of our Salesforce support guys, and he thought he knew the problem, but not the solution. So off he went to figure it out.

In the end, Salesforce requires that when you deploy code, you have to manually recompile everything - or manually run all the tests to activate all the URLs in the code. Interesting.

Once that was fixed, the calls worked and everything was able to run. I finished the production runs at about 11:00 am.

What a morning.

Simple CSV Exporting of Google DataTable

Tuesday, September 25th, 2012

GoogleVisualization.jpg

Today I did a little digging on the idea of exporting a Google Visualization Table to CSV all in javascript. Face it - the table is already there… it's got the data… the trick is how to get it all up and going for the CSV export. Well… as it turns out, it's not all that hard. I was pretty surprised.

The core of it is really the Google Visualization DataTable. Since that's the core of most of the Visualizations, that's a great universal starting point. What we're really doing in the code is making a simple javascript method that will make a URI and encode it, such that when it's opened, it'll appear as a download to the browser and be kept as a file.

The first part is to save the DataTable when you render the Google Table on the page:

  // this is the Google DataTable we'll be creating each time
  var dtable = null;
 
  // This method looks at the selected data set and loads that into
  // a new table for the target div and redraws it.
  function render(tbl) {
    // save this data table for later
    dtable = tbl;
    // now create a Google Table and populate it with this data
    var dest = document.getElementById('table_div');
    var table = new google.visualization.Table(dest);
    table.draw(tbl, table_config);
  }

At this point, we have the DataTable, and then we can place the button anywhere on the page, I happened to place it, centered at the bottom of the page:

  <p align="center">
    <input type="button" id="toCSV" value="Click to download data as CSV"
     onclick="toCSV()" />
  </p>

So that when the user clicks on the button the following code will be run:

  // this downloads the current data table as a CSV file to the client
  function toCSV() {
    var data = dtable;
    var csvData = [];
    var tmpArr = [];
    var tmpStr = '';
    for (var i = 0; i < data.getNumberOfColumns(); i++) {
      // replace double-quotes with double-double quotes for CSV compatibility
      tmpStr = data.getColumnLabel(i).replace(/"/g, '""');
      tmpArr.push('"' + tmpStr + '"');
    }
    csvData.push(tmpArr);
    for (var i = 0; i < data.getNumberOfRows(); i++) {
      tmpArr = [];
      for (var j = 0; j < data.getNumberOfColumns(); j++) {
        switch(data.getColumnType(j)) {
          case 'string':
            // replace double-quotes with double-double quotes for CSV compat
            tmpStr = data.getValue(i, j).replace(/"/g, '""');
            tmpArr.push('"' + tmpStr + '"');
            break;
          case 'number':
            tmpArr.push(data.getValue(i, j));
            break;
          case 'boolean':
            tmpArr.push((data.getValue(i, j)) ? 'True' : 'False');
            break;
          case 'date':
            // decide what to do here, as there is no universal date format
            break;
          case 'datetime':
            // decide what to do here, as there is no universal date format
            break;
          case 'timeofday':
            // decide what to do here, as there is no universal date format
            break;
          default:
            // should never trigger
        }
      }
      csvData.push(tmpArr.join(','));
    }
    var output = csvData.join('\n');
    var uri = 'data:application/csv;charset=UTF-8,' + encodeURIComponent(output);
    window.open(uri);
  }

You can see the entire page here:

<html>
<head>
<title>Unpinned Merchants</title>
<script type='text/javascript' src='https://www.google.com/jsapi'></script>
<script type='text/javascript' src='zingchart/resources/jquery.min.js'></script>
<script type='text/javascript'>
google.load('visualization', '1', {packages:['table']});
google.setOnLoadCallback(reload_executions);
// set up the fixed locations and paths for this metric visualization.
// we need to be able to pick the server (prod, uat, dev).
var view_loc = '_design/pinning/_view/unpinned';
var opts = '&reduce=false'
// get the date a week ago formatted as YYYY-MM-DD
var when = new Date();
when.setDate(when.getDate() - 7);
var weekAgo = when.getFullYear()+'-'
+('0'+(when.getMonth()+1)).substr(-2,2)+'-'
+('0'+when.getDate()).substr(-2,2);
// these will be the data sets we can get from the selected database
var divisions = new Array();
var runtimes = new Object();
// this is the Google DataTable we'll be creating each time
var dtable = null;
// the Google Table needs to have a few config parameters to make it
// look like we want it to look.
var table_config = {
showRowNumber: true,
width: '700px',
height: '503px'
};
// we can use this format spec to format the sales value column once
// we have loaded up the table.
var sv_format = {
prefix: '$',
pattern: '#,###.00'
};
// This method looks at the selected data set and loads that into
// a new table for the target div and redraws it.
function render(tbl) {
// save this data table for later
dtable = tbl;
// now create a Google Table and populate it with this data
var dest = document.getElementById('table_div');
var table = new google.visualization.Table(dest);
table.draw(tbl, table_config);
}
// this downloads the current data table as a CSV file to the client
function toCSV() {
var data = dtable;
var csvData = [];
var tmpArr = [];
var tmpStr = '';
for (var i = 0; i < data.getNumberOfColumns(); i++) {
// replace double-quotes with double-double quotes for CSV compatibility
tmpStr = data.getColumnLabel(i).replace(/"/g, '""');
tmpArr.push('"' + tmpStr + '"');
}
csvData.push(tmpArr);
for (var i = 0; i < data.getNumberOfRows(); i++) {
tmpArr = [];
for (var j = 0; j < data.getNumberOfColumns(); j++) {
switch(data.getColumnType(j)) {
case 'string':
// replace double-quotes with double-double quotes for CSV compat
tmpStr = data.getValue(i, j).replace(/"/g, '""');
tmpArr.push('"' + tmpStr + '"');
break;
case 'number':
tmpArr.push(data.getValue(i, j));
break;
case 'boolean':
tmpArr.push((data.getValue(i, j)) ? 'True' : 'False');
break;
case 'date':
// decide what to do here, as there is no universal date format
break;
case 'datetime':
// decide what to do here, as there is no universal date format
break;
case 'timeofday':
// decide what to do here, as there is no universal date format
break;
default:
// should never trigger
}
}
csvData.push(tmpArr.join(','));
}
var output = csvData.join('\n');
var uri = 'data:application/csv;charset=UTF-8,' + encodeURIComponent(output);
window.open(uri);
}
// This function takes the data coming from CouchDB and formats it
// into a series of nice DataTable objects for Google's tools.
// There will be one set per run (execution_tag), and we'll organize
// it that way for easy retrieval.
function parse_series(data) {
var table = new google.visualization.DataTable();
table.addColumn('string', 'Merchant');
table.addColumn('string', 'Category');
for(var i in data.rows) {
table.addRow([data.rows[i].value.name,
data.rows[i].value.taxonomy.category]);
}
return table;
}
// This method simply hits the selected database (on the server)
// for the proper CouchDB view, and then processes it into a series
// of ZingCharts data sets that we then render the first one.
function reload() {
// hit CouchDB for the view we need to process
var svr_opt = document.getElementById('server_opt');
var div_opt = document.getElementById('division_opt');
var run_opt = document.getElementById('run_opt');
var et = run_opt.value + '-' + div_opt.value;
var url = svr_opt.value + '/' + view_loc + '?' +
'startkey=' + encodeURI(JSON.stringify([et])) +
'&endkey=' + encodeURI(JSON.stringify([et,{}])) + opts + '&callback=?';
$.getJSON(url, function(data) {
var series = parse_series(data);
render(series);
});
}
// When we change divisions we need to update the available run times
// for the new division, and in order to do that, we have this method.
function set_runs_for_division(division) {
division = (typeof(division) !== 'undefined' ? division : document.getElementById('division_opt').value);
runtimes[division].sort();
runtimes[division].reverse();
var run_opt = document.getElementById('run_opt');
run_opt.options.length = 0;
for (var i in runtimes[division]) {
var tag = runtimes[division][i];
run_opt.options[run_opt.options.length] = new Option(tag, tag);
}
// at this point, call back to the the data we need, and then render it
reload();
}
// This function takes the list of executions currently loaded on the database
// and parses their 'execution_tag's into divisions and times and places them
// in the datastructre to make it much easier to manipulate.
function parse_execution_tags(data) {
divisions = new Array();
runtimes = new Object();
for(var i in data.rows) {
// get the execution_tag and exclude the very early ones
var exec_tag = data.rows[i].key;
if (!/-\D+$/i.test(exec_tag) || (exec_tag.substring(0,10) < weekAgo)) {
continue;
}
// now get the timestamp and division from the execution_tag
var runtime = exec_tag.replace(/-\D+/g, '');
var division = exec_tag.replace(/^.*\.\d\d\d-/g, '');
if (typeof(runtimes[division]) == 'undefined') {
runtimes[division] = new Array();
divisions.push(division);
}
runtimes[division].push(runtime);
}
// sort the divisions and create the contents of the drop-down
if (divisions.length > 0) {
divisions.sort();
var div_opt = document.getElementById('division_opt');
div_opt.options.length = 0;
for (var d in divisions) {
div_opt.options[div_opt.options.length] = new Option(divisions[d], divisions[d]);
}
}
// given the default division, load up the run times we just parsed
set_runs_for_division(divisions[0]);
}
// When we change a database, we need to reload all the known run (executions)
// that exist on that database. Then, we can populate the 'division' and 'run'
// in a nested datastructure so that it's each to update the run times for a
// given division.
function reload_executions() {
// hit CouchDB for the view of all executions it knowns about
var svr_opt = document.getElementById('server_opt');
var url = svr_opt.value + '/_design/general/_view/executions?descending=true&callback=?';
$.getJSON(url, function(data) {
parse_execution_tags(data);
});
}
</script>
</head>
<body>
<p align="center">
Database:
<select id="server_opt" onchange="reload()">
<option value='/db/production' selected="selected">Production</option>
<option value='/db/uat'>UAT</option>
<option value='/db/dev'>Dev on UAT</option>
</select>
Division:
<select id="division_opt" onchange="set_runs_for_division(this.value)">
</select>
Run:
<select id="run_opt" onchange="reload()">
</select>
</p>
<div id='table_div' style="width:700px; margin-top:10px; margin-left:auto; margin-right:auto;"></div>
<p align="center">
<input type="button" id="toCSV" value="Click to download data as CSV" onclick="toCSV()" />
</p>
</body>
</html>

The downside of this is that the file will have an unusual name. On Mac OS X with Safari 6.0.1, it's "Unknown". On other platforms, I'm sure it's something nearly as odd and useless, but that's the name of the game. There's seemingly no way to get the name of the file in the URI or the window.open() method.

Still… I'm pretty pleased. We're looking at a 100% client-side, javascript solution to the CSV generation problem. That's pretty nice. If you look at the code, there's really very little that's exclusive to the Google DataTable - it's really just the means to get the headers, and the row and column data. We could have easily built this from any regular data source and made that work as well.

Sweet.

Replacing a Write-Back Process with a Simple Report

Monday, September 24th, 2012

WebDevel.jpg

While we were messing with getting merchant data from Salesforce.com, we had another little story in Pivotal Tracker about replacing the write-back of demand gap analysis with a simple report on our Metrics Site - a series of web pages we've put together for viewing the metrics of the runs.

Given that we've had a lot of timeouts in these specific write-backs, it made a lot of sense to me to get rid of the write-backs and just have a report for the merchant researchers to use. It was a pretty simple page to throw together - I had to update the CouchDB view to change the key of the view from:

  ["2012-09-24_14:51:21.232_cleveland", "cleveland", "Food & Drink"]

to the simpler:

  ["2012-09-24_14:51:21.232_cleveland", "Food & Drink"]

because we really didn't need the division in the key, but we really needed to be able to specify the key as well as have a nice reduce set of the counts by date/time and category of merchant. That took a bit to regenerate, but when it was done, I had everything I needed - or so I thought.

The code for generating the URL was pretty simple:

  var et = run_opt.value + '_' + div_opt.value;
  var cat = cat_opt.value;
  var url = svr_opt.value + '?key=' + JSON.stringify([et,cat]);

and it wasn't obvious to me, but the ampersand in the Food & Drink was going to be a real pain in the rump for me because it's a valid JSON code, but it's also the argument separator in the URL. So I had to do a little modification:

  var et = run_opt.value + '_' + div_opt.value;
  var cat = cat_opt.value.replace("&", "%26");
  var url = svr_opt.value + '?key=' + JSON.stringify([et,cat]);

to get the ampersand into a hex value for sending to the server.

With this, I was all ready to go. I pushed the view changes to the prod and UAT CouchDBs, so in the morning, first thing, I'll be able to check it all in and then push the new pages to the server.

I sent a note to the project manager asking him if he really would prefer this, and why it'd be great for us to not have the write-back to Salesforce.com… I'm hoping this is going to work out really well.

Great Way to Skin a Cat

Monday, September 24th, 2012

WebDevel.jpg

Today we've been once again fighting Salesforce.com's APEX APIs for getting Merchant data from their servers. It's a key part of our processing, and they have an API, but it's got some major limitations, and one of the problems is getting all the merchants for a division (like Cleveland, Cincinnati, etc.) of The Shop. As we were kicking around ideas, one of my co-workers came up with the idea to fetch all the IDs first, and because there's not a lot of data in just the IDs, it's possible to get all the merchant IDs in one hit.

Then, once we have the IDs, we can make a series of parallel requests where we chop this up into requests of 100 or 1000 merchants a piece. There are a lot of benefits in this plan - first, we know the IDs right off the bat, and if we don't get valid data for them, at least we know what to clear if we need to. Secondly, by knowing all the IDs up front, we can make a series of async, parallel requests that will make the data loading a lot faster.

It's days like this, with pretty smart guys that I get a giggle. This is going to take a few hours to do, but in the end, it'll be a huge improvement in the processing of the data.

Fantastic!

Optimizing the Pinning Process (cont.)

Saturday, September 22nd, 2012

Speed

This evening I finished up the work I needed to do on the optimization of the Pinning process to use the indexed demand data, and I have to say that I'm very happy with the results. When I really dug into it, I was happy to realize that as long as I kept the complete set of matching criteria in the key, I would not have to do anything other than make the appropriate keys, look up and collect the data, and then unique it so as not to have any duplicates.

Thankfully, it wasn't really that hard. The key generation was pretty simple, and the collection and uniquing was not bad. In the end, a lot of code got thrown away because we simply weren't doing anything nearly as complicated as we had been doing, and that was great news!

I had to write a few tests on the indexing of the demand data, but that was to be expected, and once it was indexed, that was all I needed to be sure of.

It took me a while, but not too bad, and Liza and the girls were at a movie I didn't want to see, so it all really worked out.

Nice!

Optimizing the Pinning Process

Friday, September 21st, 2012

Speed

Late today I was talking with Phil, a co-worker, and he had placed some New Relic instrumentation in the code and with the results we obtained from a few runs, it was clear that the way in which we were pinning the demand points to the merchants was a real problem. Like the singularly largest time spent in the code. And in a way, it makes sense - we're scanning all demand points for each of several thousand merchants, and that's going to take some time.

What's the better solution? Well… indexes, of course!

What we decided to do was something like what we'd done in a different part of the code, and I decided that I wanted to do it. The plan was to index all the demand data very much like a HashMap does - but the key for each 'bin' is the same values that we are trying to match in the scanning code.

For example, if we're matching the City, State, and Service, then we make the key:

  key = [city, state, service]

and then use that as the key in a Hash to hold an array of all the demand points that have those three values. Of course, for a demand point with multiple services, we'll have multiple entries in the indexing data, but that's OK, as Ruby does all this with references and not deep copies.

I'm just getting started, and I'm planning on working on it this weekend, but it's looking very exciting, and I'm really happy the direction this is going.

UPDATE: I'm happy to say that I got a lot of good work done on the train. I still need to finish a lot of loose ends, but the bulk of the work should be ready to go. Lots of testing to do, but it's looking good.

Simplifying Code – Finally Killed the Facade Classes

Friday, September 21st, 2012

Code Clean Up

Today I finally decided to take an axe to three rather silly facade classes in the code. These guys were really less than 10 lines for the entire class, and it was very simple to move the one to two useful lines from the class into the calling method of another class and toss the entire class away.

It's been bothering me for a while, but I'm new to Ruby, and I didn't want to make waves in the group, but I got enough support from a few of the old guard Ruby guys that I decided to trash the silly classes.

It was really nice! I felt like the code I was getting rid of was just a red herring to someone trying to understand the code. The tests were unnecessary and burning up time, and there was really no reason for any of it.

Much better.

Fun with Bash – Making Robust Scripts

Friday, September 21st, 2012

GeneralDev.jpg

I was looking at the results of the overnight runs and realized that I really needed to work making the nightly runs far more robust by retrying the complete division run if there were errors in the processing. The trick was - How?

So I came up with the idea that the summary script that's gripping the log of the process, could, in fact, look at the log and determine if the run was a "success". Then, I could have it return a simple error code and the nightly run script could detect that and place the division into a retry queue if it failed.

Great idea, but getting the Bash scripts to do it was a little more work than I'd expected. Oh sure, the exit value from the summary script was easy, I just had to pick the right thing to test in that script, and after a few tries, I got what I needed. No… this was about the dealing with the string vs. integer nature of the bash variables.

For example, this is OK:

  if [ $pass -gt 3 ]; then
    break
  fi

unless $pass is a string, and then you get an error and die. So it's possible to say:

  if [[ $pass > 3 ]]; then
    break
  fi

but this seems to cause me grief:

  if [[ $pass >= 3 ]]; then
    break
  fi

It's not easy to figure out all the edge cases until you've written more than a few bash scripts using a feature, but hey… that's the power and grief of using bash.

In the end, it was really much easier to force the type of variables and make sure they are OK, than it was to try and figure out why 'greater than' was OK, but 'greater than or equal to' wasn't. The goal was to have a couple of scripts tied together that allowed me to ensure that they nightly runs worked, and worked well, and I've got that now.

Problem solved.