Archive for the ‘Coding’ Category

Added Checks for Database Version in Code

Friday, December 21st, 2012

PostgreSQL.jpg

In the clojure project I'm on, we have adopted the idea of migrations for the database. Basically, there's no one way to directly build the schema, you have to build the first version of the schema, and then migrate it through all the different changes until you get to the current, and final version. The idea here being that any version of the database can be brought up to snuff with the same code.

I look at it as kinda silly if we have to go through too many of these because it's unlikely that the databases will be out of sync for more than a day or two, and most likely updated at nearly the same time. But hey, what do I know? Right?

The issue with this migration scheme is that it's possible to have the code running against an old version of the schema. I suppose it's possible in the other way as well, but here it seems much more likely with all the changes that this migration strategy seems to empower. So I needed to make something that would look at the database we're about to run against, and then look to the migration path and see if this database was up to date. If not, then stop right there. Why? Because it'll make sure that we don't run the code against a database schema that the code wasn't intended to run against.

As an aside, this totally goes against the idea that the code should be more adaptive, but that seems to be not as well received in the group. They seem to want to know where the database schema is, and that it's where it should be for all the code - as opposed to using views and stored procedures to insulate such schema changes from the code. It's even possible to add another layer in the code to provide further insulation, but this works, and I'm certainly not going to change their minds on this. Pick you battles.

Now that we have this, it's safer to know that there's little chance of deploying code and running it with a mis-matched database underneath it. That's reassuring, however it's done.

Updated Demand Service to Time-Series Demand

Thursday, December 20th, 2012

Dark Magic Demand

Today I spent most of the day adding time-series demand to the existing system - server and client. This meant that I needed to migrate the database tables a bit - probably did it in a way that was a little more industrial strength than I needed - but it was a nice way to get back into the swing of things. While I could have gotten away with renaming the old column in the table, adding in the new, correctly structured column, then migrating the data from the old to the new column, and then dropping the old column, I choose to rename the table, drop all the foreign keys, make a new table, migrate all the data in the table, and then add back the keys as needed.

It took a little more time than I could have gotten away with, but it wasn't bad, and in the end, I'm glad I did it as it got me back into the swing of things with SQL and PostgreSQL so that when it comes time to do a much grander migration, and these steps will be required, I'll be ready.

Other than that, it was pretty straightforward. I needed to make sure that the old version of the API was unchanged, and that wasn't too hard, but then the new version had to be handled in the client (ruby) code, and that proved to be a little more challenging than I had thought.

The scheme we have is that if the demand is a time series of points, they will be in an array in the output, and the size of that array will determine the interval of the points - but always spanning a year. Twelve points on the array means the first point in the array is the demand for this month, and the second point is for next month, etc. If there are 52 points in the array, then the first point is for this week, and the next is 7 days out, etc.

Pretty simple, but then we needed to know the starting date for the series. After all, if this data is served up a week from now, how is the client to know which points to use? It makes sense to add a generated_at field in the API which is the starting point for the data in the time series. Once I had that in the ruby code, it was a matter of seeing what kind of data I was getting, it's length (if it's an array), and then looking forward from the generated_at time to the point in time that I'm interested in.

In all, not bad, and I'm glad I put this code into the main app now, as I want to pin this stuff down, and it's quite often the case that the guys in the group are kinda waffly about things like this. Get it in, get it done, and then their natural laziness keeps them from messing with it too much.

Simplifying SQL Arrays in Clojure

Wednesday, December 19th, 2012

Clojure.jpg

I've just spent several hours working with a co-worker on simplifying the way we are handling SQL Array values in the code. Previously, we had to make the INSERT statements and execute them directly - thus avoiding the korma generation of the PreparedStatement and setting it's values. This is OK, but it'd be nicer to be able to update korma to have it do all this properly.

So we did. It was a little frustrating because we got off-track several times, but in the end, we have something that will make it vey nice to use for the other times we need to use arrays in the code.

So time well spent.

Dealing with PostgreSQL Arrays in Clojure

Wednesday, December 19th, 2012

Clojure.jpg

One of the things I want to use a little more in PostgreSQL is the ARRAY datatype. It's pretty simple to imagine why it's a good thing -- face it, I can have the data flattened out:

  CREATE TABLE seasonality (
    id              uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    service         VARCHAR,
    jan             INTEGER,
    feb             INTEGER,
    mar             INTEGER,
    apr             INTEGER,
    may             INTEGER,
    jun             INTEGER,
    jul             INTEGER,
    aug             INTEGER,
    sep             INTEGER,
    oct             INTEGER,
    nov             INTEGER,
    DEC             INTEGER
  )

or we can put it in an array, and it's so much cleaner to deal with:

  CREATE TABLE seasonality (
    id              uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    service         VARCHAR,
    factor_by_month INTEGER[12]
  )

The problem is that when dealing with Korma, it doesn't like the array datatype as it's coming back as a Jdbc4Array and not something clojure can easily compare to a standard array of integers.

So what's a guy to do? Well… it's a pretty simple conversion:

  (defn extract-factors
    [record]
    (-> record :factor_by_month .getArray seq))

and the output of this is easily comparable to a simple array of integers.

Not bad. I spent a little time on this, and StackOverflow helped, but clojure isn't too bad at this. A lot more like erlang.

UPDATE: a co-worker pointed out that I could really make this a lot nicer, and I have to agree. The first thing is to make the function just convert the Jdbc4Array to a sequence:

  (defn array->seq
    [array]
    (-> array .getArray seq))

then we can make use of Korma's transform method on the entity to transform the data on the select calls:

  (defentity seasonality
    (transform (fn [rec] (update-in rec [:factor_by_month] array->seq)))

Then, as soon as we see the data from the select, it's already converted to a sequence and we don't have to worry about it. Works great, and really makes things cleaner.

Google Chrome dev 25.0.1364.2 is Out

Wednesday, December 19th, 2012

This morning I noticed that the Google Chrome team released dev 25.0.1364.2 and once again released completely useless release notes. It's really amazing how little effort some people will put into their craft. But then again, I've worked with a lot of folks like this, and thankfully I'm still totally baffled by their attitude.

I hope I never understand them.

Added Seasonality to Clojure Project – Harder than it Looks

Tuesday, December 18th, 2012

Dark Magic Demand

I've spent many hours today getting the seasonality data into our clojure project. And I have to come away with the fact that it's a lot harder than it looks. A lot.

The problem for me was that the way this is being done in the clojure code is to have the entities defined in one file - but the tables created in another. Then all the insert/update as well as compare and load functions are in another file. There's a lot more to this than a simple set of INSERT statements. It's a lot more.

So I spent a good chunk of the afternoon trying to understand what I needed - based on the fact that I'd done a little of this with the demands and demand sets, but it's still a lot to do.

I hope to finish it tomorrow.

Timezones, Clojure, and PostgreSQL – What a Combination

Monday, December 17th, 2012

Clojure.jpg

I have spent far more time with this problem today than it deserves. One of the real annoyances with clojure is also a touted strength - the JVM. But that's where I'm having the problems today - in the ugly world of the java.util.Date and it's ilk. When I receive a JSON snippet like the following:

  {
    "division": "cleveland",
    "start_date": "2012-12-07",
    "end_date": "2013-01-07",
    "demand" : []
  }

and need to place it into a simple PostgreSQL table:

  (:require [clojure.java.jdbc :as sql])
 
  (sql/create_table "demand_sets"
                    [:id :uuid "PRIMARY KEY DEFAULT uuid_generate_v4()"]
                    [:division_id :uuid "references divisions (id)"]
                    [:source_id :uuid "references sources (id)"]
                    [:valid_from :timestamp]
                    [:valid_to :timestamp]
                    [:loaded_at :timestamp "DEFAULT now()"])

One would expect that we'd convert the string into a valid Java Timestamp with something like clj-time:

  (defn get-demand-set
    [division]
    (let [demand-set (client/extract division)
          start-date (:start_date demand_set)
          end-date (:end_date demand_set)
          demands (:demand demand_set)]
      {:valid_from (to-date-time start-date)
       :valid_to (to-date-time end-date)
       :demand (map transform demands)}))

and, in fact, we get nice Java Date objects that are the correct time in the UTC (Zulu) time zone. This all makes sense.

Where it gets squirrley is the saving and reloading from a PostgreSQL database. The database table that was created in the above statement has the timezone in the fields, and there's a defined timezone for each instance of PostgreSQL - right there in the config file. I had my local database set to my local box - US/Central, and that was part of the problem. I wanted to see the dates in my database as I was seeing them in the JSON, and that's not realistic. I needed to understand that I'd really be looking at "2012-12-06 18:00:00" for the start-date, as that's the US/Central time of 2012-12-07 00:00:00 UTC.

Then I had to hassle with getting them out and comparing them correctly. This is really where this whole problem started - comparing dates where they formatted correctly, but were just off by a timezone or offset of some kind.

To do that, it turns out I needed to have something that would get me a single demand_set record:

  (defn find-demand-set
    [demand-set-id]
    (select-1 demand-sets
              (where {:id [= demand-set-id]})))

and here, we're using korma for the entity mapping and selecting. With this we can then get a demand set and extract out the Timestamp fields, but I need to make sure that I convert them properly:

  (defn fetch-demand-set
    [demand-set-id]
    (let [demand-set-rec (db/find-demand-set demand-set-id)
          ;; …more here
         ]
      {:valid_from (from-date (:valid_from demand-set-red))
       :valid_to (from-date (:valid_to demand-set-rec))
       ;; …more here
      }))

And then we can safely do something like this:

  (defn demand-sets-equal?
    [ds1 ds2]
    (and (= (:valid_from ds1) (:valid_from ds2))
         (= (:valid_to ds1) (:valid_to ds2))
         ;; …more stuff here
         ))

and the timestamps will properly compare.

But don't expect them to look like they did in the JSON unless you have the PostgreSQL databases set up on UTC time. Thankfully, that's what we have in the datacenter, so I don't have to worry about that. The only place I have to worry is on my laptop, and that's something I can live with for now.

There is one alternative I could have chosen. That's to have used the TIMESTAMP WITHOUT TIME ZONE field in the PostgreSQL table. I think that would have worked as well, and it would have shown the same value in all databases, regardless of the individual time zone setting. But then we would loose an important point in the time, and I wasn't ready to do that yet.

Still, to have spent several hours on this was exhausting

Back in the SQL Saddle Again

Monday, December 17th, 2012

Dark Magic Demand

Dark Magic has a PostgreSQL back-end, and I have to say that it's very nice to be back in the land of SQL after having been fighting with CouchDB and it's problems recently. If Couch just didn't fall over on us and refuse connections, then I'd probably be a little more inclined to use it, but this is getting a little out of hand.

Anyway… I am back in PostgreSQL, and it's great. This morning I just whipped up a little view that didn't take 24 hours to generate, and gave me the latest uploads by division and source that have gone into the database:

  CREATE OR REPLACE VIEW latest_uploads AS
    SELECT d.permalink, s.name, ds.loaded_at
      FROM demand_sets ds, divisions d, SOURCE s,
           ( SELECT source_id, division_id, MAX(loaded_at) AS loaded_at
               FROM demand_sets
             GROUP BY source_id, demand_id ) latest
     WHERE ds.loaded_at = latest.loaded_at
       AND ds.source_id = latest.source_id
       AND ds.division_id = latest.division_id
       AND d.id = ds.division_id
       AND s.id = ds.source_id

After this, I can simply query this view and get the latest of each of the divisions and sources and the time it was loaded. Very nice debugging tool.

More importantly, it's SQL, and that's so much easier to deal with for stable schemas.

Corrected Major Dark Magic Problem – Valid Dates

Sunday, December 16th, 2012

bug.gif

As I was trying to test the demand from Dark Magic in the Quantum Lead app, I noticed that I was getting a nasty parsing error - the dreaded UTF-8 encoding. Argh! I looked up what I might be able to do, and the solution seemed pretty simple. In the clojure code where we make the response from Dark Magic, we needed to make sure to set the charset in the headers:

  (defn return-json
    [ob]
    {:status 200
     :headers {"Content-Type" "application/json; charset=UTF-8"}
     :body (json/generate-string ob)})

But still, when I ran this code, I didn't get the JSON decoding error in Ruby, but I also didn't get any data. There must be something wrong! And sure, enough, there was.

No dates.

The pipeline code expected to see a start_date and end_date in the demand set to know if this data is valid for the current running date/time. If not, then it doesn't use it. I looked at the code in Dark Magic, and sure enough - we were throwing away the dates we were getting from the original source. I guess my co-worker who wrote this didn't hear me say those were important. (Hint: he did, he's just not disciplined)

So today I figured out enough clojure to change the code and get all these things into the system. That means new fields in the database, new insert methods, extract methods, comparison changes. Lots of things had to change, but it was worth it, because when I was done, it was working with the encoding and all the dates.

It started out very frustrating, but ended up being a great learning experience.

Feeling the Pain of Developing a System in Clojure

Sunday, December 16th, 2012

Clojure.jpg

I want to like clojure - I really do. Erlang was interesting, and it taught me a lot about functional programming - not that I'm really all that good now, but in Erlang, there were a lot of things that I found very hard. One was state - that was all in the arguments. The other was error messages. Man, those error dumps could go on for pages, and finding the first one that really was the source of the problem was a trick. But once you knew what to look for, and found it, then you had a fighting chance to figure out what you did wrong.

I'm learning that with clojure, it's not always that easy. After all, clojure is just java byte code, right? And with all the functional components written in classes in the compiler, it's easy to understand why they have classes that make no sense to a human reader.

OK, maybe to some human readers, but not to beginning clojure coders.

I came across this today and it's really pretty shocking.

Stack traces 50, 60, 100 lines long and not a clue in the lot of it where the problem is. Now I'm not saying it was hard to find… I've looked in more Java stack traces than I can count. I'm saying that every level in that stack trace was nothing in my code. Not a single thing. Certainly not by the unknown name munging that clojure uses.

So I'm left just commenting out sections of code hoping to find the ones that are causing the problem. In functional code, that's not easy as it's all nested within itself in the source file, and commenting out a section is really quite a little adventure to make sure that you get all the parens in the right places.

This is a serious problem in my mind. How long do you have to work with clojure to understand the name munging? Six months? A year? If that's the case, then there's no way a new developer on a team can do any fire-fighting, it's too stressful, and the only people to do it are the senior debs on the team.

That might not be bad, but our senior clojure dev is not a workaholic, so having him fire-fight anything before 9:00 am and after 5:00 pm is a dicey proposition. It makes me very nervous.

It makes me want to drop it and use something else.