Archive for the ‘Cube Life’ Category

Moving Forward with Low-Priority Work

Wednesday, January 2nd, 2013

Building Great Code

This evening I did a little work on a few low-priority tasks in the project I'm on. Normally, I wouldn't worry with stuff like this, but a nice guy in the group in Palo Alto really wanted this stuff added to the code, and so I took the time to finish the work I started several weeks ago that he needed.

Normally, I agree with the priorities we have. It keeps my work at a manageable level. But there are times - like this, that I feel kinda bad that I don't have a few more hours in the day/week/month to push a few tangential things forward a bit. Yeah, it's gotten me scolded in the past, but my argument this time is simple - I'm at home, I already did a lot of work for the day on the stuff that is meant to be my high priority, and it was after 5:00 pm.

Basically, I did it on my own time.

Thankfully, it wasn't that hard - took me a few hours, and now it's on a pull request into the main codebase and we should have something ready to test in a day or so. I feel like I've been a "nice guy" today. Makes me feel good.

Refactored the Closed Deal Code

Wednesday, January 2nd, 2013

Dark Magic Demand

Today I spent a good chunk of the day refactoring the clojure code with regards to the Closed Deals out of Salesforce. Specifically, after I got all the original work done, I realized that as it stood, I didn't have a change of working. Why? Because the deals would change over time. The moment a deal popped up, it's have a fill count of zero and then over time, it'd increase to some final value.

This means we'd have to have mutable database records, and that's totally against the immutable concept that my co-worker had for the entire clojure-based project. So to make it mutable would make it impossible to re-run the code for any point in time - and that's no good. So it meant that I needed to re-do the code and get it to work with closed deal sets and then compare the sets I load in from Salesforce to the sets available in the database.

It wasn't horrific, but it wasn't trivial. I'm getting better at clojure, and that's nice, but it's also the korma library and all the other supporting tools that I need to get up to speed on to really be productive. In the end, there were a few issues, and yet I was able to get them resolved pretty easily.

In the end, I was able to get the imports running again in UAT and that was a great feeling. It was (albeit tiny) progress for the day.

Added Closed Deals to Dark Magic

Sunday, December 30th, 2012

Dark Magic Demand

This morning I've finally finished up on something I've been working on and off for several days now - adding Closed Deals from Salesforce into the Demand Service that we're building in clojure. At some point, I'll probably drop the statement about what it's written in, but it's still too soon, as I think it was picked for all the wrong reasons. But here nor there this morning, it is the tool for now, and the future will bring what it brings.

The reason for needing the Closed Deals from Salesforce is that we are still totally dependent on Salesforce for holding all the actual bookable deals and merchant data. If we want to adjust the demand forecast by what's in inventory, then we need to get the data from Salesforce on at least a nightly basis, and use that data to update the demand forecast and "back it off" by the deals that have been closed since the demand was generated.

So if there's a demand forecast point of 1000 units, generated three days ago, if the sales reps closed a deal for 500 unit yesterday, we need to really only show the demand of 500 units today. The problem with all this is that Salesforce is not known for being exactly useful data and effective schemas. It's all there, but it's by no means easy to get to, or easy to use.

The first thing to do was to spend a day or two on just getting the data from Salesforce. Not as easy as I'd have hoped, as everything seems to be a REST interface - what… these people never hear of sockets? Anyway… I had a lot of grief with the paging that you have to do with Salesforce as it can't (won't) send you all the data at once. And it's not a size-limit thing, though they may advertise that as the reason, I've gotten "pages" with three small elements in them, so it's more than that, and for whatever reason, it's there and I have to deal with it.

I thought I had it all figured out, but I was slightly mistaken on the functionality of the take-while function in clojure. It seems that it continues as long as the value returned is "truthy" in some sense of the word. Meaning, it automatically stops on hitting a nil, but I made a function to test for that. Simple mistake, and it worked, but it wasn't the "clojure way", and when in Rome…

After I was able to get the data, I spent a couple of days just figuring out the PostgreSQL database schema so that we can load up the data easily and then get it out of the database as easily. We also need to make sure that we create the clojure entities for these tables, and that they are related to one another in the proper way. It's a usable, but manual ORM for clojure, and when in Rome…

With the schema working, I then had to try to load the data into the tables. This started out OK, but then as soon as I tried to read it back out, I ran into problems. They way the code is structured, we read out the potentially matching data, compare it to the next one, and then based on the results of that comparison, we either stop what we're doing (it's already there) or we insert the new data.

My code was failing on the pulling out the data, as the comparisons weren't working as planned. What I saw was a nice opportunity to chagne the logic a bit, so I did. I created a function that simply looked in the database to see if the deal I had, in hand, was already in the database. If so, it returned the ID of that deal. If not, it returned nil. This was really nice in that I don't care to read it all out and then compare it. I just want to know if it's already there!

This made things a lot nicer, and then things really started working. Very nice. No duplicates are loaded, but we can run this script for an historical two week period every day and be assured that we're missing nothing. Very sweet.

Took a while, but I learned a lot, and it's working well now.

Added Checks for Database Version in Code

Friday, December 21st, 2012

PostgreSQL.jpg

In the clojure project I'm on, we have adopted the idea of migrations for the database. Basically, there's no one way to directly build the schema, you have to build the first version of the schema, and then migrate it through all the different changes until you get to the current, and final version. The idea here being that any version of the database can be brought up to snuff with the same code.

I look at it as kinda silly if we have to go through too many of these because it's unlikely that the databases will be out of sync for more than a day or two, and most likely updated at nearly the same time. But hey, what do I know? Right?

The issue with this migration scheme is that it's possible to have the code running against an old version of the schema. I suppose it's possible in the other way as well, but here it seems much more likely with all the changes that this migration strategy seems to empower. So I needed to make something that would look at the database we're about to run against, and then look to the migration path and see if this database was up to date. If not, then stop right there. Why? Because it'll make sure that we don't run the code against a database schema that the code wasn't intended to run against.

As an aside, this totally goes against the idea that the code should be more adaptive, but that seems to be not as well received in the group. They seem to want to know where the database schema is, and that it's where it should be for all the code - as opposed to using views and stored procedures to insulate such schema changes from the code. It's even possible to add another layer in the code to provide further insulation, but this works, and I'm certainly not going to change their minds on this. Pick you battles.

Now that we have this, it's safer to know that there's little chance of deploying code and running it with a mis-matched database underneath it. That's reassuring, however it's done.

Updated Demand Service to Time-Series Demand

Thursday, December 20th, 2012

Dark Magic Demand

Today I spent most of the day adding time-series demand to the existing system - server and client. This meant that I needed to migrate the database tables a bit - probably did it in a way that was a little more industrial strength than I needed - but it was a nice way to get back into the swing of things. While I could have gotten away with renaming the old column in the table, adding in the new, correctly structured column, then migrating the data from the old to the new column, and then dropping the old column, I choose to rename the table, drop all the foreign keys, make a new table, migrate all the data in the table, and then add back the keys as needed.

It took a little more time than I could have gotten away with, but it wasn't bad, and in the end, I'm glad I did it as it got me back into the swing of things with SQL and PostgreSQL so that when it comes time to do a much grander migration, and these steps will be required, I'll be ready.

Other than that, it was pretty straightforward. I needed to make sure that the old version of the API was unchanged, and that wasn't too hard, but then the new version had to be handled in the client (ruby) code, and that proved to be a little more challenging than I had thought.

The scheme we have is that if the demand is a time series of points, they will be in an array in the output, and the size of that array will determine the interval of the points - but always spanning a year. Twelve points on the array means the first point in the array is the demand for this month, and the second point is for next month, etc. If there are 52 points in the array, then the first point is for this week, and the next is 7 days out, etc.

Pretty simple, but then we needed to know the starting date for the series. After all, if this data is served up a week from now, how is the client to know which points to use? It makes sense to add a generated_at field in the API which is the starting point for the data in the time series. Once I had that in the ruby code, it was a matter of seeing what kind of data I was getting, it's length (if it's an array), and then looking forward from the generated_at time to the point in time that I'm interested in.

In all, not bad, and I'm glad I put this code into the main app now, as I want to pin this stuff down, and it's quite often the case that the guys in the group are kinda waffly about things like this. Get it in, get it done, and then their natural laziness keeps them from messing with it too much.

Simplifying SQL Arrays in Clojure

Wednesday, December 19th, 2012

Clojure.jpg

I've just spent several hours working with a co-worker on simplifying the way we are handling SQL Array values in the code. Previously, we had to make the INSERT statements and execute them directly - thus avoiding the korma generation of the PreparedStatement and setting it's values. This is OK, but it'd be nicer to be able to update korma to have it do all this properly.

So we did. It was a little frustrating because we got off-track several times, but in the end, we have something that will make it vey nice to use for the other times we need to use arrays in the code.

So time well spent.

Dealing with PostgreSQL Arrays in Clojure

Wednesday, December 19th, 2012

Clojure.jpg

One of the things I want to use a little more in PostgreSQL is the ARRAY datatype. It's pretty simple to imagine why it's a good thing -- face it, I can have the data flattened out:

  CREATE TABLE seasonality (
    id              uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    service         VARCHAR,
    jan             INTEGER,
    feb             INTEGER,
    mar             INTEGER,
    apr             INTEGER,
    may             INTEGER,
    jun             INTEGER,
    jul             INTEGER,
    aug             INTEGER,
    sep             INTEGER,
    oct             INTEGER,
    nov             INTEGER,
    DEC             INTEGER
  )

or we can put it in an array, and it's so much cleaner to deal with:

  CREATE TABLE seasonality (
    id              uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    service         VARCHAR,
    factor_by_month INTEGER[12]
  )

The problem is that when dealing with Korma, it doesn't like the array datatype as it's coming back as a Jdbc4Array and not something clojure can easily compare to a standard array of integers.

So what's a guy to do? Well… it's a pretty simple conversion:

  (defn extract-factors
    [record]
    (-> record :factor_by_month .getArray seq))

and the output of this is easily comparable to a simple array of integers.

Not bad. I spent a little time on this, and StackOverflow helped, but clojure isn't too bad at this. A lot more like erlang.

UPDATE: a co-worker pointed out that I could really make this a lot nicer, and I have to agree. The first thing is to make the function just convert the Jdbc4Array to a sequence:

  (defn array->seq
    [array]
    (-> array .getArray seq))

then we can make use of Korma's transform method on the entity to transform the data on the select calls:

  (defentity seasonality
    (transform (fn [rec] (update-in rec [:factor_by_month] array->seq)))

Then, as soon as we see the data from the select, it's already converted to a sequence and we don't have to worry about it. Works great, and really makes things cleaner.

Added Seasonality to Clojure Project – Harder than it Looks

Tuesday, December 18th, 2012

Dark Magic Demand

I've spent many hours today getting the seasonality data into our clojure project. And I have to come away with the fact that it's a lot harder than it looks. A lot.

The problem for me was that the way this is being done in the clojure code is to have the entities defined in one file - but the tables created in another. Then all the insert/update as well as compare and load functions are in another file. There's a lot more to this than a simple set of INSERT statements. It's a lot more.

So I spent a good chunk of the afternoon trying to understand what I needed - based on the fact that I'd done a little of this with the demands and demand sets, but it's still a lot to do.

I hope to finish it tomorrow.

Timezones, Clojure, and PostgreSQL – What a Combination

Monday, December 17th, 2012

Clojure.jpg

I have spent far more time with this problem today than it deserves. One of the real annoyances with clojure is also a touted strength - the JVM. But that's where I'm having the problems today - in the ugly world of the java.util.Date and it's ilk. When I receive a JSON snippet like the following:

  {
    "division": "cleveland",
    "start_date": "2012-12-07",
    "end_date": "2013-01-07",
    "demand" : []
  }

and need to place it into a simple PostgreSQL table:

  (:require [clojure.java.jdbc :as sql])
 
  (sql/create_table "demand_sets"
                    [:id :uuid "PRIMARY KEY DEFAULT uuid_generate_v4()"]
                    [:division_id :uuid "references divisions (id)"]
                    [:source_id :uuid "references sources (id)"]
                    [:valid_from :timestamp]
                    [:valid_to :timestamp]
                    [:loaded_at :timestamp "DEFAULT now()"])

One would expect that we'd convert the string into a valid Java Timestamp with something like clj-time:

  (defn get-demand-set
    [division]
    (let [demand-set (client/extract division)
          start-date (:start_date demand_set)
          end-date (:end_date demand_set)
          demands (:demand demand_set)]
      {:valid_from (to-date-time start-date)
       :valid_to (to-date-time end-date)
       :demand (map transform demands)}))

and, in fact, we get nice Java Date objects that are the correct time in the UTC (Zulu) time zone. This all makes sense.

Where it gets squirrley is the saving and reloading from a PostgreSQL database. The database table that was created in the above statement has the timezone in the fields, and there's a defined timezone for each instance of PostgreSQL - right there in the config file. I had my local database set to my local box - US/Central, and that was part of the problem. I wanted to see the dates in my database as I was seeing them in the JSON, and that's not realistic. I needed to understand that I'd really be looking at "2012-12-06 18:00:00" for the start-date, as that's the US/Central time of 2012-12-07 00:00:00 UTC.

Then I had to hassle with getting them out and comparing them correctly. This is really where this whole problem started - comparing dates where they formatted correctly, but were just off by a timezone or offset of some kind.

To do that, it turns out I needed to have something that would get me a single demand_set record:

  (defn find-demand-set
    [demand-set-id]
    (select-1 demand-sets
              (where {:id [= demand-set-id]})))

and here, we're using korma for the entity mapping and selecting. With this we can then get a demand set and extract out the Timestamp fields, but I need to make sure that I convert them properly:

  (defn fetch-demand-set
    [demand-set-id]
    (let [demand-set-rec (db/find-demand-set demand-set-id)
          ;; …more here
         ]
      {:valid_from (from-date (:valid_from demand-set-red))
       :valid_to (from-date (:valid_to demand-set-rec))
       ;; …more here
      }))

And then we can safely do something like this:

  (defn demand-sets-equal?
    [ds1 ds2]
    (and (= (:valid_from ds1) (:valid_from ds2))
         (= (:valid_to ds1) (:valid_to ds2))
         ;; …more stuff here
         ))

and the timestamps will properly compare.

But don't expect them to look like they did in the JSON unless you have the PostgreSQL databases set up on UTC time. Thankfully, that's what we have in the datacenter, so I don't have to worry about that. The only place I have to worry is on my laptop, and that's something I can live with for now.

There is one alternative I could have chosen. That's to have used the TIMESTAMP WITHOUT TIME ZONE field in the PostgreSQL table. I think that would have worked as well, and it would have shown the same value in all databases, regardless of the individual time zone setting. But then we would loose an important point in the time, and I wasn't ready to do that yet.

Still, to have spent several hours on this was exhausting

Back in the SQL Saddle Again

Monday, December 17th, 2012

Dark Magic Demand

Dark Magic has a PostgreSQL back-end, and I have to say that it's very nice to be back in the land of SQL after having been fighting with CouchDB and it's problems recently. If Couch just didn't fall over on us and refuse connections, then I'd probably be a little more inclined to use it, but this is getting a little out of hand.

Anyway… I am back in PostgreSQL, and it's great. This morning I just whipped up a little view that didn't take 24 hours to generate, and gave me the latest uploads by division and source that have gone into the database:

  CREATE OR REPLACE VIEW latest_uploads AS
    SELECT d.permalink, s.name, ds.loaded_at
      FROM demand_sets ds, divisions d, SOURCE s,
           ( SELECT source_id, division_id, MAX(loaded_at) AS loaded_at
               FROM demand_sets
             GROUP BY source_id, demand_id ) latest
     WHERE ds.loaded_at = latest.loaded_at
       AND ds.source_id = latest.source_id
       AND ds.division_id = latest.division_id
       AND d.id = ds.division_id
       AND s.id = ds.source_id

After this, I can simply query this view and get the latest of each of the divisions and sources and the time it was loaded. Very nice debugging tool.

More importantly, it's SQL, and that's so much easier to deal with for stable schemas.