Archive for the ‘Clojure Coding’ Category

Calculations are Flowing!

Friday, January 18th, 2013

Dark Magic Demand

Well, it's been a while getting here, but we finally have the chain reaction of calculations working in the demand service all the way up to, and including, the closed deal adjustments. I wrote quite a few tests on the closed deal adjustments because I'm working in a new language, and in order to make sure that it's working as I expect, I needed to test everything. First, in the unit tests of the taxonomy and price checks, as well as how to decompose the location data, and then in the second-level functions like closed deal option decomposition, and finally, in the top-level functions.

This is typically a ton more testing than I normally write, and in about six months, I won't be writing these tests, either. It's just that I'm really not at all sure about this clojure code, and in order to make myself feel more comfortable about it, I needed the tests. And the REPL.

But to see the logs emit the messages I expected, and the ones I didn't, was a joy to behold. This was a long time in coming, and it's been a rough and bumpy road, but it's getting a little smoother, and with this milestone, next week should prove to be a great step forward for the project.

I'm looking forward to it.

Pulled Additional Fields from Salesforce for Demand Adjustment

Thursday, January 17th, 2013

Salesforce.com

This afternoon I realized that we really need to have a few additional fields about the closed deals for the demand adjustment. Specifically, we have no idea of the start date for the deal, and while we have the close_date, that's not much use to us if it's empty, as many are until they have an idea of when they really want to shut down the deal. Additionally, one of the sales reps pointed out that there are projected sales figures on each 'option' in a deal, and rather than look at the total projected sales and divide it up, as we have in the past, we should be looking for those individual projected sales figures and using them - if no sales have been made.

Seems reasonable, so I added those to the Salesforce APEX class, and ran it to make sure it was all OK. There were no code changes to the ruby code because we had (smartly) left it as a hash, and so additional fields aren't going to mess things up… but in out code we can now take advantage of them.

Surprisingly time-consuming because I had to drop tables, and add properties and get things in line - but that's what happens when you add fields to a schema… you have to mess with it. Still, it's better than using a document database for this stuff. Relational with a simple structure beats document every time.

Clojure Training Class with Aaron

Thursday, January 17th, 2013

Clojure.jpg

Today has been an all-day training session at The Shop with Aaron B. - the Security Lead at Groupon and one of the maintainers of Clojure for a while. Very interesting to see his take on things, and I have to say, it's far more refreshing than what I'm used to seeing from the clojure crew closer to me. For instance, Aaron sees a time and a place for OO code and functional code. He also sees that while multithreaded code is hard, there are lots of people that are very good at it - but most aren't.

His take on a lot of the things in the language was nice, as my current tutor is not really giving me a lot other than a very mathematical bent on the situation, and that's not mapping into my experience as nicely as I might like. It's really pretty bad, actually. But it's getting better, and the more I work with it, the better I'm getting, and that makes things a lot easier.

I'm guessing that in about six months, things will be pretty much settled out, and I'll be able to just hit up Aaron now and again for performance advice, or how things work under the hood. WHen that time comes, I'll be a lot happier using clojure in production systems, but for now, it's still pretty hard.

Gotta keep working at it.

Finally Finished Up Closed Deal Adjustments

Wednesday, January 16th, 2013

Dark Magic Demand

Today I was finally able to get the first good cut at the closed deal adjustment feature in the demand service we've been working on. The basic concept is that anything that's been closed by a sales rep since the delivery of the demand has to be subtracted from the demand as it can already be considered "fulfilled". This is already being done in the mainline ruby app, but the goal with this demand service is to get it out of the ruby app and into the demand service so that all the pulling and adjusting can be done there, and then the mainline ruby app doesn't have to waste the time doing it.

It's a good idea, and it'll save us between 20% and 30% in the runtime, and as we scale to the global markets, that's going to be very nice to have. But it's not easy taking mutable, referenced, ruby objects and make it all immutable and functional. But today I think I have it all done. Well… at least ready for testing. That's tomorrow...

Working Hard on Closed Deal Adjustment

Tuesday, January 15th, 2013

Dark Magic Demand

The large part of today has been spent trying to add in the closed deal adjustment to the demand to make Dark Magic on feature-parity with the code in Quantum Lead. The idea is that we needed to look at all the deals that have closed after the raw demand was generated, and then look at how much of them sold, or at least what their sales expectations are. This is then subtracted from the demand units requested as it represents fulfilled demand in the time span of the demand's life.

The code was all written in ruby, and worked fine, but was heavily dependent on the fact that (j)ruby uses references and mutable objects as it had multiple passes of the adjustment based on the individual deal options that had closed. In order to refactor this into something that works in the immutable space of essential clojure (which is how I'm calling the design philosophy of my co-worker that's a clojure purist) I spent lots and lots of time trying to figure out how to put this into that primitive functional style.

I had learned that there were many things in clojure that could make this very simple. There's transactional memory for mutability, and references, so that - in theory, it could have been a very simple port of the code. But Socrates, as I'll call my co-worker with the purist attitudes, wouldn't have any of this. Since this is being done (against my better judgement) to make him happy, that's what we're doing.

I wonder if he has any idea why this is all really happening? Maybe he just thinks we all "see the light"? Who knows. I'm half hoping that this fails and we do it again in ruby, and half succeeds so I can have guaranteed job security as Socrates will be bored with all this long before we're done with the system we're building.

In the end, I was stuck on the one problem, and I had to sit and wait for about 20 mins as Socrates came back from his walkabout - wherever that took him, and have him show me how to refactor this ruby code into something that works in clojure.

Functional languages are nice. But I'm hating this experience because I feel I'm in the middle of an ocean, and my only life raft is a guy that doesn't care about jack-diddly, and comes and goes as he pleases. It's a very uncomfortable position to be in. Maybe in six months when I know clojure well enough not to have these blocks, it'll be OK… but now, it's just exceptionally frustrating to think that this company is making a business decision to use this new language, and then forcing me to be on this project because they want it to succeed.

Very frustrating.

Documented the API and Logic for the Demand Service

Monday, January 14th, 2013

Dark Magic Demand

I spent most of the day working on the docs for the API and logic for the clojure project I'm working on. This all fits in the README.md in the root of the github repo so that it's rendered nicely on the GitHub pages. It's nice to have something there, as it allows everyone to know what the goals are for the next few updates/releases and gives them talking points in case they want to ask some questions and don't want to start by asking the simple questions.

I used OmniGraph Sketcher to make the graphs, as it seemed to me to have the best toolset for the job, and it worked out quite nicely. Really came through for me. I then threw in some JSON from the actual service to make sure it was the right format, and started typing.

The biggest part of all this is keeping things straight and not overusing the terms. There just aren't that many terms for 'demand' and 'inventory', so I had to be a little creative and careful about how I wrote it all up. It's certainly not perfect, but it's a far far cry better than nothing, and I think I'm pretty decent at documentation when I get going. So there.

Glad to get that all done so that we don't get hammered by questions from project managers.

Slugging through Clojure Learning Curve

Thursday, January 10th, 2013

Clojure.jpg

I spent the entire day doing something I'd hoped would have only taken about an hour this morning. I wanted to get the seasonal adjustment code done, and then test and be finished! But that's not how it turns out, is it? I spent a lot of time fighting with clojure, and more time fighting with how it was being used by the architect of the project I'm on.

The biggest thing about clojue today was the use of the apply function. I hadn't really used it before, but I had a sequence of 12-element sequences and I wanted to make a new sequence - call it a transpose if you want to borrow from linear algebra, but basically it's another array, but this time, it's a sequence of 12 elements - each with a size of 'n' represented by the number of sequences in the original data.

I needed this for computing the maximum factor for each of the months based on a series of yearly factor sequences.

I found something that looked like it'd work:

  (def factors '((1 2 3) (4 5 6) (7 8 9)))
  (apply map vector factors)
  => ((1 4 7) (2 5 8) (3 6 9))

so I used it and was happy that I found it. But my happiness faded when I got a clojure error message saying that somewhere in my code, the map function was getting only one argument - and that was an error. I looked and looked, and could not find the problem.

I chatted with my teammate and he pointed out that if factors is empty, then map will have no arguments, and that's the error. He used this saying "all languages" do this with varargs.

OK… let me get this right… if I have a non-nil, but empty factors, then this fails? Yup, it does. I was pretty pissed at clojure for this. First, for not handling edge conditions better than this, and secondly for the error message that could have been far far more useful than it was.

OK, to be fair, I can see his point. They don't want to have to make exceptions for things like:

  (apply + ())

and I was using something I'd never used before. My bad. I should have known better.

But the guy that's supposed to be the "clojure horse" of the group is a guy that barely works 40 hrs a week and is on a three-day trip to talk at a conference. Bully for him, but that doesn't leave me anything to do if I don't venture out on my own. And if I do, I'm going to get burned like this, and it's going to piss me off.

Period.

When I know a lot more about clojure, I'll be more comfortable and I won't mind his walkabouts to wherever. But while I'm trying to make progress, it's a real problem for me. But it's something I can't say anything about because I'm not his manager. Yippee.

The other big issue was the circular reference problem in clojure. I really can't believe there's no solution to that. I've gotta hit google for that tomorrow.

Working on Adding Seasonal Adjustment to Code (cont.)

Thursday, January 10th, 2013

Dark Magic Demand

This morning I finally finished up the problem of the demand time series time-shifting: a simple left-shift of the sequence:

  (defn shift-left
    "Do a simple left-shift for the sequence of 'n' items"
    [coll n]
    (seq (into (vec (drop n coll)) (vec (take n coll)))))

and this works wonderfully:

  (def a '(1 2 3 4 5 6))
  (shift-left a 3)
  => (4 5 6 1 2 3)

Nice little function to have. Interestingly, the way this works is entirely different than a very similar function:

  (defn shift-left-no-vec
    "Do a simple left-shift for the sequence of 'n' items"
    [coll n]
    (into (drop n coll) (take n coll)))

This guy doesn't convert the drop and take sequences to vectors, and the result is interestingly different:

  (def a '(1 2 3 4 5 6))
  (shift-left-no-vec a 3)
  => (4 5 6 3 2 1)

seems the into function when operating on a sequence pulls from the end and not the beginning, or something odd. The drop and take looked good in the REPL, but there's something about these sequences that messes things up. So I used vectors and then made it back into a sequence at the end.

When I tried to test this in the larger system I realized that there were tons of problems in the code that had been written by my teammate. There were poorly thought out arty functions for updating the data - not hard to fix, but it took a little time. There were really poorly designed insert methods that wouldn't properly strip out the existing id values from a source data set for it to be inserted as a new data set without errors - again, not too hard to fix, but it was thoughtless.

Then there's the one that probably bugs me the most because it's brought about by clojure the language and the clojure guru that's really setting the design and namespaces for the project. Circular references. Clojure can't deal with them.

OK, if you're in a single namespace, you can say:

  (declare foo)

and then use it and later in the same namespace say:

  (defn foo
    [x]
    (* x x))

but if you need to have two methods in two namespaces, and there are require directives set up to have one include the other, then you're hosed. No way around it.

Now I'm no language designed - OK, I did it long ago, and it wasn't nearly this complex, but the point is this seems like a horrible oversight. Unless, the assumption I'm working under is faulty. Maybe the point is that the designers didn't expect to have 50 namespaces like we have. Maybe they expected 5. In that case, this all goes away. If I were to collapse a few namespaces, that would fix things up.

But then my clojure guru would not approve.

This is what's really beginning to bug me - good languages like ruby spoiled by the tubists culture. Ditto here. Don't be a math prude - don't try to design the namespaces before you know you need them. Arguably, in a C++ project, I'd lay out directories as I knew I'd need them - but only a minimal set. This is something that just goes to bad design, and since I'm a total newbie, this is on my guru. He didn't even see it coming.

I want to learn clojure, and ruby, and then use them as I'd use C++ - with the features I see I need, and how I'd use it. But I'm getting the feeling that that's not how this is going. I'm being shown a limited set of the features in clojure and that's all he wants to use. I get it. I'm not sure I'd want someone to be doing template meta-programming in their first C++ project, but it's still a little unsettling.

Working on Adding Seasonal Adjustment to Code

Wednesday, January 9th, 2013

Dark Magic Demand

For most of the day I've been working on taking the fixed demand from a source and making it a year-long demand time series. It's not done, but it's close, and I've been testing the code along the way. The idea is pretty simple: the source of demand we're forced to work with is a single value given to us over a specific valid time window. That's it:

  { 'start_date': '2012-12-07',
    'end_date': '2013-02-07',
    'units': 1000,
    'service': 'Skiing' }

of course there are locational components and taxonomy is a little more complicated than I've made it out to be, but for what we're doing, it doesn't matter.

On top of this, we have a source of Seasonality Factors that we have been given by the city planners and regional VPs. These are how they see the demand changing over the course of a year for a given service:

  { 'cleveland': { 'Skiing': [200 150 0 0 0 0 0 0 100 100 150 200],
                   'Ziplining': [0 0 100 100 100 100 150 150 150 150 100 50] } }

Here, we have two services for Cleveland - Skiing and Ziplining. The data shows that the factors (an array of integers representing the 12 months in the year) are high for Skiing in the cold months, and opposite that for Ziplining. Not a big surprise.

What I needed to get done today - and almost made it, was to take these factors and the raw demand and make a function that converted the initial demand data to something like this:

  { 'start_date': '2012-12-07',
    'end_date': '2013-02-07',
    'units': [2000 1500 0 0 0 0 0 0 1000 1000 1500 2000],
    'service': 'Skiing' }

where the fixed 1000 units are multiplied by the factors (given in percentage) and expanded into a nice array. Interesting to note - we've got to do one more thing: when we have calculated the data like this, we need to rotate the time series of units to make sure that the current month is in the first position and the remaining 11 are in "the future" representing the next 11 months of demand.

It's this last part that got me today. I couldn't get it figured out. But I will.

Added Checks in for Bad Dates from Source

Tuesday, January 8th, 2013

bug.gif

This morning we had s significant production (and UAT) problem caused by a group that's not nearly as together as I'd like them to be for a critical source of data for the project I'm on. The are currently the source of demand for the system, and that's used to find matching merchants and asses the potential value of each matched merchant in order to enable us to rank them for the sales reps. It's really one of the two key data sets we need to run.

Interestingly enough, when they created this data set, they had the foresight to include a start_date and an end_date in the data set so that we could tell when the data was generated, and how long we were to consider it 'valid':

  { 'start_date': '2012-12-07',
    'end_date': '2013-01-07' }

but as it's clear from my example, today was one day past expiration! This means that when the code ran, it saw the data, and it accepted it, but it realized that it was expired, and so it didn't use it. Lovely.

The solution was pretty simple. Since we have a new system that deals with the demand from this source and puts it into a nice PostgreSQL database, we could simply go to the psql console and say:

  UPDATE demand_sets
     SET valid_to='2013-02-07'
   WHERE valid_to='2013-01-07';

and I bought them a month.

I could then re-run everything and a mere couple of hours later, everything was fine. Once again proving that Problems are solved by people that show up.

I thought I had it all figured out but then a little later in the day it hit me: When we go to reload the data from the "stale" source, we'll see that it's different, and assume that "different" means "newer" and we'd overwrite the data I just updated with something that was clearly going to fail again. Not good.

So I realized that I needed a real solution.

What I realized was that there's no reason to make the insert code more complex. I can look at the reader code from the API endpoint and see if I'm getting data that's clearly expired. It's right there I can fix things up nicely. Then I started thinking that my original solution was a nice start, I just needed to formalize it in the code. So I started with a simple function in the app's until namespace:

  (use 'clj-time.core)
 
  (defn leap-frog-date
    "Looks at a date to see if it's in the past, if so, add a number of
    months to the date until it's in the future and return that."
    [d]
    (let [ts (now)]
      (cond
        (after? ts d) (plus d (months (inc (in-months (interval d ts)))))
        :else d)))

this will get me a nice way to filter the end_date before it becomes the valid_to in the database.

But I wanted to add in a little logging as well, but it didn't belong in this general function. So in the importing namespace, I simply had a private method with a side-effect:

  (use 'clojure.tools.logging)
 
  (defn- leap-frog-date!
    [d]
    (let [d' (leap-frog-date d)]
      (if-not (= d d')
        (error "Had to move expired dates: %tF to %tF" (to-date d) (to-date d')))
      d'))

At this point I'm ready to go. Every month that they miss regenerating the data, I'll detect this, log it in my logs (for easy detection) and then update the expiration date so that we use the old data anyway.

It's not ideal, but it's such an important part of the system we can't afford to just not run because they can't get their act together. At the same time, I was pretty happy with the clojure tools in the clj-time and clojure.tools.logging packages as they really made this a lot nicer than if I had to do this all myself.