Added Checks in for Bad Dates from Source
This morning we had s significant production (and UAT) problem caused by a group that's not nearly as together as I'd like them to be for a critical source of data for the project I'm on. The are currently the source of demand for the system, and that's used to find matching merchants and asses the potential value of each matched merchant in order to enable us to rank them for the sales reps. It's really one of the two key data sets we need to run.
Interestingly enough, when they created this data set, they had the foresight to include a start_date and an end_date in the data set so that we could tell when the data was generated, and how long we were to consider it 'valid':
{ 'start_date': '2012-12-07', 'end_date': '2013-01-07' }
but as it's clear from my example, today was one day past expiration! This means that when the code ran, it saw the data, and it accepted it, but it realized that it was expired, and so it didn't use it. Lovely.
The solution was pretty simple. Since we have a new system that deals with the demand from this source and puts it into a nice PostgreSQL database, we could simply go to the psql console and say:
UPDATE demand_sets SET valid_to='2013-02-07' WHERE valid_to='2013-01-07';
and I bought them a month.
I could then re-run everything and a mere couple of hours later, everything was fine. Once again proving that Problems are solved by people that show up.
I thought I had it all figured out but then a little later in the day it hit me: When we go to reload the data from the "stale" source, we'll see that it's different, and assume that "different" means "newer" and we'd overwrite the data I just updated with something that was clearly going to fail again. Not good.
So I realized that I needed a real solution.
What I realized was that there's no reason to make the insert code more complex. I can look at the reader code from the API endpoint and see if I'm getting data that's clearly expired. It's right there I can fix things up nicely. Then I started thinking that my original solution was a nice start, I just needed to formalize it in the code. So I started with a simple function in the app's until namespace:
(use 'clj-time.core) (defn leap-frog-date "Looks at a date to see if it's in the past, if so, add a number of months to the date until it's in the future and return that." [d] (let [ts (now)] (cond (after? ts d) (plus d (months (inc (in-months (interval d ts))))) :else d)))
this will get me a nice way to filter the end_date before it becomes the valid_to in the database.
But I wanted to add in a little logging as well, but it didn't belong in this general function. So in the importing namespace, I simply had a private method with a side-effect:
(use 'clojure.tools.logging) (defn- leap-frog-date! [d] (let [d' (leap-frog-date d)] (if-not (= d d') (error "Had to move expired dates: %tF to %tF" (to-date d) (to-date d'))) d'))
At this point I'm ready to go. Every month that they miss regenerating the data, I'll detect this, log it in my logs (for easy detection) and then update the expiration date so that we use the old data anyway.
It's not ideal, but it's such an important part of the system we can't afford to just not run because they can't get their act together. At the same time, I was pretty happy with the clojure tools in the clj-time and clojure.tools.logging packages as they really made this a lot nicer than if I had to do this all myself.