Archive for the ‘Cube Life’ Category

Fixing Replication on Postgres

Wednesday, October 15th, 2014

PostgreSQL.jpg

This morning I noticed that my replicated database wasn't synced to the master, and that meant that something had happened to cause the master to be moving too much data, or have too long a pause time to keep synced. Re-establishing the link isn't all that hard, but it takes time - so I turned off the process feeding data into the master database, and then, as the postgres user I coped the files from the master to the slave:

  $ cd /var/groupon
  $ rsync -av --exclude postgresql.conf --exclude postmaster.pid
          pgsql/ db2:/var/groupon/pgsql/

and then I simply need to restart the properly configured slave, and restart, and the log will report:

  LOG: streaming replication successfully connected to primary

Refactoring Analytics for Multi-Mode Design (cont,)

Wednesday, October 15th, 2014

Finch Experiments

This morning I finished up the deployment of the code to UAT and then set about updating the docs in the GitHub/E repo for all the changes. This wasn't all that hard as most of it was already there, but I needed to make sure that I had the docs match the code in the server.clj namespace, and then the big job of adding the docs for the different attribution schemes.

I've got two schemes - the original scheme that wasn't all that good, and the new one looks at the order of the experiment experiences per session and then attributes the weight of the deal on those in a time-decaying fashion. It's not great, but it's a massive improvement over the old scheme.

I wrote all this up, with examples, and it's checked in for the front-end guys to use.

How Useless a One-on-One Can Get

Tuesday, October 14th, 2014

PHB.gif

I'm not an easy person to life with - nor one to work with. I'm demanding of myself, and in that, most people think that even if I'm not visibly demanding of them, I am, internally, very disappointed in them if they aren't achieving the same levels. That's not the case, and as much as I try to correct that misconception, it persists. Still, I try to be a Team Player - do the jobs that need to be done regardless of how I feel about them. But I have to confess that a useless One-on-One is something I'm about to opt-out of.

I have a manager that believes he's a good manager to all. He's not - at least not to me, but that doesn't factor into his thinking. He wants to get my feedback - which is always the same - this group is split along the lines of those that do, and those that primarily sit around. If I were in the latter group, I'd be happy - but I'm not, and the group I'm in has a population of 1.

What happens is that Management pushes down on me to deliver, and I do my best to meet their expectations. It's hard work because I'm delivering things for many different groups, and so when I look around and see people playing cards at lunch, or arriving at 9:00 am, it's hard not to feel like I'm being taken advantage of.

So I have a one-on-one with a guy that doesn't understand the first thing about what I'm doing. He doesn't even know who I'm working for - what they are asking, when they are asking it, and what those deadlines are. He is my Manager in name only. And yet he wants a one-on-one.

Silly. And I've had enough of silly.

Refactoring Analytics for Multi-Mode Design

Tuesday, October 14th, 2014

Finch Experiments

Today has been a lot of coding on a change that I was asked to do - and in truth, it's a nice feature to have in the experiment analytics. It's basically the request that all attribution schemes be active in the system at once, and then there is just a different URL for the different versions of the code and data.

This is not unlike what I've done in the past, and it's a good way to allow things to be isolated and roll-out new features without having to mess with a long and involved testing process. But the problem here is that we have a finite redis space, and there's only so much I can do at 50k msgs/sec, and while I would love to have all the versions running side by side, I've already had problems getting the first one working and fitting in redis.

I know this doesn't compute to the current Management, and it's sad that the guy doesn't really understand what's going on and just admit it. Sadly, that seems to be hard for a lot of managers - it makes them seem more human, but at the same time, forces them to expose their weaknesses.

Anyway... I've been re-writing this code all day, and I think I have it all code complete. In the morning I'll try it out in UAT and see how it goes.

Updating Message Forwarders

Tuesday, October 14th, 2014

Unified Click

Over the last few days I've been refactoring the code to publish not only to my attached kafka cluster, but to the shared kafka cluster in the data centers. This hasn't been the most reliable in the past, but I was getting a lot of pressure from Management to do this when the load on my cluster got too much to handle, and while I pushed back for Quality of Service (QoS) reasons, when the clients agreed that I would no longer be responsible for the latency once I handed it to the shared cluster, I relented.

At the same time, I've been responsible for the forwarding of messages from one data center to the other because the group that was generating the data would not make the move on it's end, and a sensitive project needed this data. So once again - because I can, I have to.

The intersection of these events is what I was working on this morning - forwarding from one kafka cluster to another for the purpose of having the messages be seen in both clusters in all data centers. It's not hard work, and I have monitors and such set up to make sure it's working, but the idea that a product team is doing the message shuffling is just... well... crazy.

Amazingly Poor Management Style

Monday, October 13th, 2014

PHB.gif

I'm no paragon of virtue about anything. I'm even willing to admit that I might not be everyone's cup to tea. But I do believe I know my limitations pretty well, and one of the things I know I'm not that good at is management. So when I see someone manage me in a way that I'd never manage a person, I know it has to be exceptionally bad management.

Such is the case today.

I released a new version of the experiment analytics to UAT, and sine it had significant differences in the counting and attribution code, it made sense to me to have it run side-by-side with production, and the old models, for several weeks to let this new system prove itself out.

You'd have thought I took an Uzi to a class of third graders.

The uproar from my management was immediate and complete open-loop. The statement was that we had to support both for forever. Now, I'm no Rocket Scientist, but even I know that given the problems we've had to date fitting just the one set of analytics in a real-time Storm cluster and on four 192GB Redis boxes, the idea of doubling that wasn't one I wanted to do.

Clearly, managers are so much smarter than I am, and they know exactly how much redis space it'll take, and exactly how many CPU cycles the second treatment will consume. How silly of me to not consider their genius. Yes, clearly, I have very little regard for this decision because it was made thinking that they were still in their little playground with Rails and 5000 machines in the data center.

I try not to be hard on those that don't get it - but this response did make me pretty steamed. So to the guy that insisted that I add support for all versions at the same time, I sent back an email saying "Hey, I have two ideas - which one do you like?" - just to get his feedback since he was so adamant about the inclusion of this in the product.

"That's not my job to decide - Doers Decide"

OK, so let's get this straight - You are insisting that this feature be added, but you have no opinion on it? I find that remarkably hard to believe. In fact, I think with near-certainty you're a liar. You either don't know why you're asking for this feature, or you really do have an opinion, and so in one case or the other, you're a liar.

It's things like this that make me realize that good people don't necessarily make good managers. And even good managers in some fields are horrible managers in others. Such is the case with this guy - he was my manager a year ago, in a different group, but here - he's so out of his league it's sad to see him do things like this.

Thankfully, there are recruiters.

Balancing Kafka Readers to Shared Kafka Cluster

Friday, October 10th, 2014

Kafka Broker

Today has been a bad day. We had just a little too much traffic, and one too many clients on our kafka cluster in production, and the consequence was that the kafka cluster started to generate back-pressure on the Storm topology bolts writing to the cluster, and that, in turn, caused back-pressure up the topology, and that caused way too much latency.

So I had to do something.

I could have asked for three more boxes, and I did, but I knew they weren't going to be available in time. Management said they wanted me to use the shared Kafka cluster operated by the Operations group, but their attitude about the data passing through them is that of a Common Carrier - "I don't care what it is", so they take no responsibility for it.

Still it was the only option, and when I got the release from the clients that I could not be held responsible for the data one I'd handed it off to the shared kafka cluster, I added the code to be able to do just that - publish to two kafka clusters - mine and theirs.

The load I put on theirs threw them on their heels at first, and I hope they get it all cleared up soon, but I have a feeling that my weekend is going to be filled with monitoring the latency through the system.

It’s Hard to Beat Clojure for Complex Systems

Wednesday, October 8th, 2014

Clojure.jpg

I'm doing a lot of performance work on the Deal Performance Service this morning - trying to handle these 6 million row files and imports into Postgres, and I am constantly struck by the fantastic smile I get on my face when working with clojure in this environment. It's simple, compact, expressive, and actually very readable to me. And I'm a huge fan of extensive comments.

Being able to take a serial process like:

  (doseq [l (take limit (line-sew rdr))
          :let [raw (line-to-ddo l stamp deals taxy)]
          :when raw
          :let [old (get eod (:ddo_key raw))
                ddl (if-not eod raw (merge-daily cfg stamp raw old))]
          :when ddl
          :let [row (gen-csv-row ddl all-fields)]
    (spit csv-name row :append true)))))

and with almost no effort, turn it into a parallel one:

  (let [do-it (fn [l] (if-let [raw (line-to-ddo l stamp deals taxy)]
                        (let [old (get eod (:ddo_key raw))]
                          (if-let [ddo (if-not eod
                                         raw
                                         (merge-daily cfg stamp raw old))]
                            (gen-csv-row ddo all-fields)))))]
    (doseq [row (pmap do-it (take limit (line-sew rdr))]
      (spit csv-name row :append true)))

I simply have to make the body of the doseq into a simple function and then use pmap as the source for the new doseq, and I'm in business. This has made the refactoring of the code so much simpler. It's easy to re-work the code over and over to get the optimal data flow.

And then there's the memoization... Sure, you can cache in any language, and it's not all that hard. But again, the ease with which it's added after the fact to a clojure function is really why it's so powerful. You can start out with nothing cached and see what needs to be cached after you start doing performance tests. This makes the refactoring so much easier than predicting if caching is going to be needed in any case, and then try to make a system of functions or classes ready to move that way, should it be necessary.

I've done it in C++ - and it's not horrible, but it means that you have classes for looking everything up, and then within each class, the implementation is either with - or without - caching. It can all be done, but it complicates everything because now there's a class for loading everything.

I'm sure there are tons of other languages that folks like. Heck, I like Obj-C and C++, but I have to look at what clojure is capable of creating and facilitating, and have to marvel at it's design. Really quite impressive.

Tinkering with the External Loader for DDO

Tuesday, October 7th, 2014

DealPerf Svc

Since I didn't have anything else to do this afternoon, I decided that I could get a lot done with the tests and checks on the external database loader I've been working on. The main box is still cranking through the back-log, and while it does that, I can get to work on copying over the source files, checking the deployment, running the script a few at a time, and then seeing that everything would be copied over - as planned.

Then it's just a matter of letting the back-log clear, and firing this guy up for good.

Having a Rough Couple of Weeks

Tuesday, October 7th, 2014

Bad Idea

I spent some time this morning going over my Git commit logs trying to find out what I've been doing in the last two weeks - and then converting those into decent posts - at least as decent as I could make them in hind-sight. The reason for all this is the re-org, and the resulting group that's I've been thrust into. It's not a pretty sight - and I've seen this far too many times in my professional life to think it's going to end well.

I had a conversation with the larger group about why I see so many git force push messages in HipChat. I'm sure there's a reason, and it's part of some work-flow - I just haven't used that, and it seems pretty harsh to be used so often. Fact is, their reason was that it just made the commits "prettier". That they wanted to edit their commits like they edit their code.

I was silent because that was almost certainly one of the most insane things I've heard in a long time. Yes, by all means, let's risk the corruption of the git repo on the server because you didn't take time to think about what you were doing before you decided to commit it. Yeah, that's a good plan.

Better yet - let's make it part of the standard work-flow that you teach new developers. Yeah, that's a great idea. I just can't wait to start messing things up because I refuse to put forth the effort to think about what I'm doing prior to doing it.

In another meeting, one of the other developers felt that a Core Value of the new group should be that... and I quote: We all should have fun. Yeah... I don't even have to respond to this because unlink this developer, I understand the point of a public company, and that they really aren't in existence to make their employees happy - or make sure they have fun - they are in the business of making money.

I tried to say that fun is a great thing to have, but holding it up as the reason we're doing - and not doing - things is kinda crazy. That we don't have to have fun to get work done. It would be nice - but it's not necessary. Flew right over his head.

And the odd thing is that this is not a bad guy - well... I've heard he's a good guy, so I want to believe that my assessment of him to date is way off, but when he goes on about how I couldn't be more wrong... well... it's hard to give him the benefit of the doubt - or write it off to a miscommunication or something.

Nope, I'm in a group that doesn't seem to share a single value I have. They think it's acceptable risk to have git force push in their workflows, and they think that they have to have fun or they don't have to work. Wacky.

It's been hard to get things done, and if I took their lead, I wouldn't have done a thing - but thankfully, I didn't, and did a good chunk of work each day. In the end, I need off this group. I like the work, I just can't stand the management and co-workers any longer.