Today I did a little digging on the idea of exporting a Google VisualizationTable to CSV all in javascript. Face it - the table is already there… it's got the data… the trick is how to get it all up and going for the CSV export. Well… as it turns out, it's not all that hard. I was pretty surprised.
The core of it is really the Google Visualization DataTable. Since that's the core of most of the Visualizations, that's a great universal starting point. What we're really doing in the code is making a simple javascript method that will make a URI and encode it, such that when it's opened, it'll appear as a download to the browser and be kept as a file.
The first part is to save the DataTable when you render the Google Table on the page:
// this is the Google DataTable we'll be creating each timevar dtable =null;// This method looks at the selected data set and loads that into// a new table for the target div and redraws it.function render(tbl){// save this data table for later
dtable = tbl;// now create a Google Table and populate it with this datavar dest = document.getElementById('table_div');var table =new google.visualization.Table(dest);
table.draw(tbl, table_config);}
At this point, we have the DataTable, and then we can place the button anywhere on the page, I happened to place it, centered at the bottom of the page:
<p align="center">
<input type="button" id="toCSV" value="Click to download data as CSV"
onclick="toCSV()" />
</p>
So that when the user clicks on the button the following code will be run:
// this downloads the current data table as a CSV file to the clientfunction toCSV(){var data = dtable;var csvData =[];var tmpArr =[];var tmpStr ='';for(var i =0; i < data.getNumberOfColumns(); i++){// replace double-quotes with double-double quotes for CSV compatibility
tmpStr = data.getColumnLabel(i).replace(/"/g,'""');
tmpArr.push('"'+ tmpStr +'"');}
csvData.push(tmpArr);for(var i =0; i < data.getNumberOfRows(); i++){
tmpArr =[];for(var j =0; j < data.getNumberOfColumns(); j++){switch(data.getColumnType(j)){case'string':// replace double-quotes with double-double quotes for CSV compat
tmpStr = data.getValue(i, j).replace(/"/g,'""');
tmpArr.push('"'+ tmpStr +'"');break;case'number':
tmpArr.push(data.getValue(i, j));break;case'boolean':
tmpArr.push((data.getValue(i, j))?'True':'False');break;case'date':// decide what to do here, as there is no universal date formatbreak;case'datetime':// decide what to do here, as there is no universal date formatbreak;case'timeofday':// decide what to do here, as there is no universal date formatbreak;default:// should never trigger}}
csvData.push(tmpArr.join(','));}var output = csvData.join('\n');var uri ='data:application/csv;charset=UTF-8,'+ encodeURIComponent(output);
window.open(uri);}
You can see the entire page here:
The downside of this is that the file will have an unusual name. On Mac OS X with Safari 6.0.1, it's "Unknown". On other platforms, I'm sure it's something nearly as odd and useless, but that's the name of the game. There's seemingly no way to get the name of the file in the URI or the window.open() method.
Still… I'm pretty pleased. We're looking at a 100% client-side, javascript solution to the CSV generation problem. That's pretty nice. If you look at the code, there's really very little that's exclusive to the Google DataTable - it's really just the means to get the headers, and the row and column data. We could have easily built this from any regular data source and made that work as well.
While we were messing with getting merchant data from Salesforce.com, we had another little story in Pivotal Tracker about replacing the write-back of demand gap analysis with a simple report on our Metrics Site - a series of web pages we've put together for viewing the metrics of the runs.
Given that we've had a lot of timeouts in these specific write-backs, it made a lot of sense to me to get rid of the write-backs and just have a report for the merchant researchers to use. It was a pretty simple page to throw together - I had to update the CouchDB view to change the key of the view from:
because we really didn't need the division in the key, but we really needed to be able to specify the key as well as have a nice reduce set of the counts by date/time and category of merchant. That took a bit to regenerate, but when it was done, I had everything I needed - or so I thought.
The code for generating the URL was pretty simple:
var et = run_opt.value+'_'+ div_opt.value;var cat = cat_opt.value;var url = svr_opt.value+'?key='+ JSON.stringify([et,cat]);
and it wasn't obvious to me, but the ampersand in the Food & Drink was going to be a real pain in the rump for me because it's a valid JSON code, but it's also the argument separator in the URL. So I had to do a little modification:
var et = run_opt.value+'_'+ div_opt.value;var cat = cat_opt.value.replace("&","%26");var url = svr_opt.value+'?key='+ JSON.stringify([et,cat]);
to get the ampersand into a hex value for sending to the server.
With this, I was all ready to go. I pushed the view changes to the prod and UAT CouchDBs, so in the morning, first thing, I'll be able to check it all in and then push the new pages to the server.
I sent a note to the project manager asking him if he really would prefer this, and why it'd be great for us to not have the write-back to Salesforce.com… I'm hoping this is going to work out really well.
Posted in Coding, Cube Life | Comments Off on Replacing a Write-Back Process with a Simple Report
Today we've been once again fighting Salesforce.com's APEX APIs for getting Merchant data from their servers. It's a key part of our processing, and they have an API, but it's got some major limitations, and one of the problems is getting all the merchants for a division (like Cleveland, Cincinnati, etc.) of The Shop. As we were kicking around ideas, one of my co-workers came up with the idea to fetch all the IDs first, and because there's not a lot of data in just the IDs, it's possible to get all the merchant IDs in one hit.
Then, once we have the IDs, we can make a series of parallel requests where we chop this up into requests of 100 or 1000 merchants a piece. There are a lot of benefits in this plan - first, we know the IDs right off the bat, and if we don't get valid data for them, at least we know what to clear if we need to. Secondly, by knowing all the IDs up front, we can make a series of async, parallel requests that will make the data loading a lot faster.
It's days like this, with pretty smart guys that I get a giggle. This is going to take a few hours to do, but in the end, it'll be a huge improvement in the processing of the data.
Fantastic!
Posted in Coding, Cube Life | Comments Off on Great Way to Skin a Cat
This evening I finished up the work I needed to do on the optimization of the Pinning process to use the indexed demand data, and I have to say that I'm very happy with the results. When I really dug into it, I was happy to realize that as long as I kept the complete set of matching criteria in the key, I would not have to do anything other than make the appropriate keys, look up and collect the data, and then unique it so as not to have any duplicates.
Thankfully, it wasn't really that hard. The key generation was pretty simple, and the collection and uniquing was not bad. In the end, a lot of code got thrown away because we simply weren't doing anything nearly as complicated as we had been doing, and that was great news!
I had to write a few tests on the indexing of the demand data, but that was to be expected, and once it was indexed, that was all I needed to be sure of.
It took me a while, but not too bad, and Liza and the girls were at a movie I didn't want to see, so it all really worked out.
Nice!
Posted in Coding, Cube Life | Comments Off on Optimizing the Pinning Process (cont.)
Late today I was talking with Phil, a co-worker, and he had placed some New Relic instrumentation in the code and with the results we obtained from a few runs, it was clear that the way in which we were pinning the demand points to the merchants was a real problem. Like the singularly largest time spent in the code. And in a way, it makes sense - we're scanning all demand points for each of several thousand merchants, and that's going to take some time.
What's the better solution? Well… indexes, of course!
What we decided to do was something like what we'd done in a different part of the code, and I decided that I wanted to do it. The plan was to index all the demand data very much like a HashMap does - but the key for each 'bin' is the same values that we are trying to match in the scanning code.
For example, if we're matching the City, State, and Service, then we make the key:
key = [city, state, service]
and then use that as the key in a Hash to hold an array of all the demand points that have those three values. Of course, for a demand point with multiple services, we'll have multiple entries in the indexing data, but that's OK, as Ruby does all this with references and not deep copies.
I'm just getting started, and I'm planning on working on it this weekend, but it's looking very exciting, and I'm really happy the direction this is going.
UPDATE: I'm happy to say that I got a lot of good work done on the train. I still need to finish a lot of loose ends, but the bulk of the work should be ready to go. Lots of testing to do, but it's looking good.
Posted in Coding, Cube Life | Comments Off on Optimizing the Pinning Process
Today I finally decided to take an axe to three rather silly facade classes in the code. These guys were really less than 10 lines for the entire class, and it was very simple to move the one to two useful lines from the class into the calling method of another class and toss the entire class away.
It's been bothering me for a while, but I'm new to Ruby, and I didn't want to make waves in the group, but I got enough support from a few of the old guard Ruby guys that I decided to trash the silly classes.
It was really nice! I felt like the code I was getting rid of was just a red herring to someone trying to understand the code. The tests were unnecessary and burning up time, and there was really no reason for any of it.
Much better.
Posted in Coding, Cube Life | Comments Off on Simplifying Code – Finally Killed the Facade Classes
I was looking at the results of the overnight runs and realized that I really needed to work making the nightly runs far more robust by retrying the complete division run if there were errors in the processing. The trick was - How?
So I came up with the idea that the summary script that's gripping the log of the process, could, in fact, look at the log and determine if the run was a "success". Then, I could have it return a simple error code and the nightly run script could detect that and place the division into a retry queue if it failed.
Great idea, but getting the Bash scripts to do it was a little more work than I'd expected. Oh sure, the exit value from the summary script was easy, I just had to pick the right thing to test in that script, and after a few tries, I got what I needed. No… this was about the dealing with the string vs. integer nature of the bash variables.
For example, this is OK:
if[$pass-gt3]; thenbreakfi
unless$pass is a string, and then you get an error and die. So it's possible to say:
if[[$pass>3]]; thenbreakfi
but this seems to cause me grief:
if[[$pass>= 3]]; thenbreakfi
It's not easy to figure out all the edge cases until you've written more than a few bash scripts using a feature, but hey… that's the power and grief of using bash.
In the end, it was really much easier to force the type of variables and make sure they are OK, than it was to try and figure out why 'greater than' was OK, but 'greater than or equal to' wasn't. The goal was to have a couple of scripts tied together that allowed me to ensure that they nightly runs worked, and worked well, and I've got that now.
There are several problems related to the underlying sockets in JRuby that are fixed in 1.7.0, so The Team was interested in moving to it as soon as realistically possible, and today was that day. Everything seemed to be working just fine, but then when we tried to run the code in the CI environment we got all kinds of failures due to the code we were using. While CI was running JDK 1.7, and I think we're all on JDK 1.6 for our laptops, the glibc that exists with CentOS 5 is too old to run the JRuby 1.7.0 code.
I have used CentOS 5 for a long time. It's stable, but it's old. There's a lot to be said for stable, but when it's impossible to deploy things because the fundamental libraries are too old, or the packages are too old to run a recent package from someone else, then it's time to really consider a change.
Here, the solution seems to be to build a custom glibc and then just force the LD_LIBRARY_PATH for the particular application. But that's got all kinds of stability problems in it. Glibc isn't like just another library - it's a fundamental library in the OS. There's tie-ins to the compiler and a lot of other things that aren't as easy to assume will be "OK" with just a new version of glib.
But I've already asked, and it's just not going to be possible to update to a more recent OS. Politics and support make this just right out.
This afternoon I saw that Safari 6.0.1 was out on Software Updates along with iPhoto. I'm very interested in the improvements in Safari, so I had to update right away. Interestingly enough, this did not require a reboot - so they must have done something in 10.7.x to make it possible to update Safari without the reboot. It used to be required.
What's very exciting is that the memory usage has really changed. This is great news as I was really getting close to the limit recently. Now I've got a good bit of headroom, and don't have to worry about things for a while.
This morning I noticed that Google Chrome dev 23.0.1270.0 was out, and there's a nice set of changes for this release. There's the V8 javascript engine - 3.13.7.1, and then there's quite a bit of codec/playback work done as well. I don't typically do a lot of that, but I can see that others do, so it's a good thing to get nailed down.
The speed is nice, the redrawing is superb… very nice tool.