Archive for October, 2014

Updating Message Forwarders

Tuesday, October 14th, 2014

Unified Click

Over the last few days I've been refactoring the code to publish not only to my attached kafka cluster, but to the shared kafka cluster in the data centers. This hasn't been the most reliable in the past, but I was getting a lot of pressure from Management to do this when the load on my cluster got too much to handle, and while I pushed back for Quality of Service (QoS) reasons, when the clients agreed that I would no longer be responsible for the latency once I handed it to the shared cluster, I relented.

At the same time, I've been responsible for the forwarding of messages from one data center to the other because the group that was generating the data would not make the move on it's end, and a sensitive project needed this data. So once again - because I can, I have to.

The intersection of these events is what I was working on this morning - forwarding from one kafka cluster to another for the purpose of having the messages be seen in both clusters in all data centers. It's not hard work, and I have monitors and such set up to make sure it's working, but the idea that a product team is doing the message shuffling is just... well... crazy.

Amazingly Poor Management Style

Monday, October 13th, 2014

PHB.gif

I'm no paragon of virtue about anything. I'm even willing to admit that I might not be everyone's cup to tea. But I do believe I know my limitations pretty well, and one of the things I know I'm not that good at is management. So when I see someone manage me in a way that I'd never manage a person, I know it has to be exceptionally bad management.

Such is the case today.

I released a new version of the experiment analytics to UAT, and sine it had significant differences in the counting and attribution code, it made sense to me to have it run side-by-side with production, and the old models, for several weeks to let this new system prove itself out.

You'd have thought I took an Uzi to a class of third graders.

The uproar from my management was immediate and complete open-loop. The statement was that we had to support both for forever. Now, I'm no Rocket Scientist, but even I know that given the problems we've had to date fitting just the one set of analytics in a real-time Storm cluster and on four 192GB Redis boxes, the idea of doubling that wasn't one I wanted to do.

Clearly, managers are so much smarter than I am, and they know exactly how much redis space it'll take, and exactly how many CPU cycles the second treatment will consume. How silly of me to not consider their genius. Yes, clearly, I have very little regard for this decision because it was made thinking that they were still in their little playground with Rails and 5000 machines in the data center.

I try not to be hard on those that don't get it - but this response did make me pretty steamed. So to the guy that insisted that I add support for all versions at the same time, I sent back an email saying "Hey, I have two ideas - which one do you like?" - just to get his feedback since he was so adamant about the inclusion of this in the product.

"That's not my job to decide - Doers Decide"

OK, so let's get this straight - You are insisting that this feature be added, but you have no opinion on it? I find that remarkably hard to believe. In fact, I think with near-certainty you're a liar. You either don't know why you're asking for this feature, or you really do have an opinion, and so in one case or the other, you're a liar.

It's things like this that make me realize that good people don't necessarily make good managers. And even good managers in some fields are horrible managers in others. Such is the case with this guy - he was my manager a year ago, in a different group, but here - he's so out of his league it's sad to see him do things like this.

Thankfully, there are recruiters.

Balancing Kafka Readers to Shared Kafka Cluster

Friday, October 10th, 2014

Kafka Broker

Today has been a bad day. We had just a little too much traffic, and one too many clients on our kafka cluster in production, and the consequence was that the kafka cluster started to generate back-pressure on the Storm topology bolts writing to the cluster, and that, in turn, caused back-pressure up the topology, and that caused way too much latency.

So I had to do something.

I could have asked for three more boxes, and I did, but I knew they weren't going to be available in time. Management said they wanted me to use the shared Kafka cluster operated by the Operations group, but their attitude about the data passing through them is that of a Common Carrier - "I don't care what it is", so they take no responsibility for it.

Still it was the only option, and when I got the release from the clients that I could not be held responsible for the data one I'd handed it off to the shared kafka cluster, I added the code to be able to do just that - publish to two kafka clusters - mine and theirs.

The load I put on theirs threw them on their heels at first, and I hope they get it all cleared up soon, but I have a feeling that my weekend is going to be filled with monitoring the latency through the system.

Remembering Skitch

Friday, October 10th, 2014

Skitch.jpg

I can remember seeing Skitch for the first time - while still in beta. I think I got the link from one of the weblogs I read, but I can remember seeing it, and what it could do, and thinking This is IT!. I signed up for the beta, and when it went commercial, I bought a copy.

Two, in fact.

And then they sold out to Evernote... and redesigned the experience... and all of a sudden it wasn't the lightweight but powerful little image editor and uploader for posting to this journal, it was something that integrated with Evernote, and they took away the image hosting - except for Evernote, etc.

Basically, they sold out and destroyed what I loved about the product.

Then came Monosnap and it looked to be exactly what I needed - a Skitch 1.0 replacement. So I got it, and I have been using it for a while, but the truth of the matter is that it's buggy. Some menu items don't show up all the time... the copy and paste of an image doesn't work... it's a nice try, and it worked on some version if Mac OS X, but they haven't kept up with it, and it shows.

Glui

So now I'm looking at Glui. It's coming with some pretty good recommendations from friends that also loved Skitch, but were looking for a replacement, and it's only posting service is Dropbox, but I can live with that for now. Still... it's missing a lot of the nice features that were in Skitch - specifically, making it easy for me to get at the URL of the image in Dropbox.

Why do all these tools want to put their chrome on the image? Why not do what Skitch did, and give me a page where I can get at all the URLs, and pick what I want?

Yes... Skitch was perfect at this... and then they sold out.

It's pretty sad...

UPDATE: HA! I found that it's a preference item on Glui to copy the page link or the direct image link. That's much better. They also have the nice 'scaling' option for my Retina MacBook Pro. This might actually just be me needing to get into Glui's way of doing things. That would be very nice. šŸ™‚

Google Chrome 40.0.2182.4 is Out

Friday, October 10th, 2014

Google Chrome

Well... I guess I can hit the major updates, and this morning dropped the move to 40.x.x.x for Google Chrome. It's pretty amazing to me that for as long as I've been using Chrome, it's gone from not good enough to be second-tier, to very good, to primary for work. It's amazing what a boatload of money will do when it comes to getting things done. Case in point... why did they add the name button to the tab-bar?

Odd Name Button

They used to have a little face up in the same level, or when you had only the one profile, they didn't show anything. Why not that now? I have only one profile on all my machines, but they insist on showing me this box with "me" in it.

Someone didn't put a lot of thought into this. Keep working, guys...

Looking at Cassandra for Fast SQL Storage

Thursday, October 9th, 2014

Cassandra

I've got a lot of streaming data in Storm, and I'm doing quite a bit of analytical processing on that data, but I can't store much of it because the cost of storage is so high - latency. So I end up using a lot of redis boxes and having boxes read out of that into more conventional storage - like Postgres. But I'm starting to hear good things about Cassandra and Storm working well together.

My concern is real speed. I have an average message rate of 40k to 50k msgs/sec, with peaks as high as four times that. I need to be able to handle those peaks - which are by no means the once a month peak levels of several times that, but it's something I see on a regular basis, and we need to be able to take all this data from the topology and not slow it down.

If I can really do this on, say, eight machines, then I'll be able to have the kind of deep dive we've needed for looking into the analytics we're calculating. This would be a very big win.

I've reached out to a few groups that are doing something very similar to this, and their preliminary results say we can do it. That's really good news. We'll have to see what happens with the real hardware and software.

Redis Cluster 3.0.0 RC is Out!

Thursday, October 9th, 2014

Redis Database

I have done a lot of work with redis, and of late, the most work I've been doing with it was to shard (aka cluster) many redis instances on a single box - and on multiple boxes. This isn't bad, but let's face it, it'd be great if redis could do this all on it's own - like it currently does replication.

Then I read:

Basically it is a roughly 4 years old project. This is about two thirds the whole history of the Redis project. Yet, it is only today, that Iā€™m releasing a Release Candidate, the first one, of Redis 3.0.0, which is the first version with Cluster support.

Very nice! I'll be very interested in seeing how it works, and how it will scale with network load and if it's able to be configured for on-box vs. off-box connectivity. That could make a huge difference in the communication bandwidth.

Nice to see that it's almost here.

It’s Hard to Beat Clojure for Complex Systems

Wednesday, October 8th, 2014

Clojure.jpg

I'm doing a lot of performance work on the Deal Performance Service this morning - trying to handle these 6 million row files and imports into Postgres, and I am constantly struck by the fantastic smile I get on my face when working with clojure in this environment. It's simple, compact, expressive, and actually very readable to me. And I'm a huge fan of extensive comments.

Being able to take a serial process like:

  (doseq [l (take limit (line-sew rdr))
          :let [raw (line-to-ddo l stamp deals taxy)]
          :when raw
          :let [old (get eod (:ddo_key raw))
                ddl (if-not eod raw (merge-daily cfg stamp raw old))]
          :when ddl
          :let [row (gen-csv-row ddl all-fields)]
    (spit csv-name row :append true)))))

and with almost no effort, turn it into a parallel one:

  (let [do-it (fn [l] (if-let [raw (line-to-ddo l stamp deals taxy)]
                        (let [old (get eod (:ddo_key raw))]
                          (if-let [ddo (if-not eod
                                         raw
                                         (merge-daily cfg stamp raw old))]
                            (gen-csv-row ddo all-fields)))))]
    (doseq [row (pmap do-it (take limit (line-sew rdr))]
      (spit csv-name row :append true)))

I simply have to make the body of the doseq into a simple function and then use pmap as the source for the new doseq, and I'm in business. This has made the refactoring of the code so much simpler. It's easy to re-work the code over and over to get the optimal data flow.

And then there's the memoization... Sure, you can cache in any language, and it's not all that hard. But again, the ease with which it's added after the fact to a clojure function is really why it's so powerful. You can start out with nothing cached and see what needs to be cached after you start doing performance tests. This makes the refactoring so much easier than predicting if caching is going to be needed in any case, and then try to make a system of functions or classes ready to move that way, should it be necessary.

I've done it in C++ - and it's not horrible, but it means that you have classes for looking everything up, and then within each class, the implementation is either with - or without - caching. It can all be done, but it complicates everything because now there's a class for loading everything.

I'm sure there are tons of other languages that folks like. Heck, I like Obj-C and C++, but I have to look at what clojure is capable of creating and facilitating, and have to marvel at it's design. Really quite impressive.

Tinkering with the External Loader for DDO

Tuesday, October 7th, 2014

DealPerf Svc

Since I didn't have anything else to do this afternoon, I decided that I could get a lot done with the tests and checks on the external database loader I've been working on. The main box is still cranking through the back-log, and while it does that, I can get to work on copying over the source files, checking the deployment, running the script a few at a time, and then seeing that everything would be copied over - as planned.

Then it's just a matter of letting the back-log clear, and firing this guy up for good.

Having a Rough Couple of Weeks

Tuesday, October 7th, 2014

Bad Idea

I spent some time this morning going over my Git commit logs trying to find out what I've been doing in the last two weeks - and then converting those into decent posts - at least as decent as I could make them in hind-sight. The reason for all this is the re-org, and the resulting group that's I've been thrust into. It's not a pretty sight - and I've seen this far too many times in my professional life to think it's going to end well.

I had a conversation with the larger group about why I see so many git force push messages in HipChat. I'm sure there's a reason, and it's part of some work-flow - I just haven't used that, and it seems pretty harsh to be used so often. Fact is, their reason was that it just made the commits "prettier". That they wanted to edit their commits like they edit their code.

I was silent because that was almost certainly one of the most insane things I've heard in a long time. Yes, by all means, let's risk the corruption of the git repo on the server because you didn't take time to think about what you were doing before you decided to commit it. Yeah, that's a good plan.

Better yet - let's make it part of the standard work-flow that you teach new developers. Yeah, that's a great idea. I just can't wait to start messing things up because I refuse to put forth the effort to think about what I'm doing prior to doing it.

In another meeting, one of the other developers felt that a Core Value of the new group should be that... and I quote: We all should have fun. Yeah... I don't even have to respond to this because unlink this developer, I understand the point of a public company, and that they really aren't in existence to make their employees happy - or make sure they have fun - they are in the business of making money.

I tried to say that fun is a great thing to have, but holding it up as the reason we're doing - and not doing - things is kinda crazy. That we don't have to have fun to get work done. It would be nice - but it's not necessary. Flew right over his head.

And the odd thing is that this is not a bad guy - well... I've heard he's a good guy, so I want to believe that my assessment of him to date is way off, but when he goes on about how I couldn't be more wrong... well... it's hard to give him the benefit of the doubt - or write it off to a miscommunication or something.

Nope, I'm in a group that doesn't seem to share a single value I have. They think it's acceptable risk to have git force push in their workflows, and they think that they have to have fun or they don't have to work. Wacky.

It's been hard to get things done, and if I took their lead, I wouldn't have done a thing - but thankfully, I didn't, and did a good chunk of work each day. In the end, I need off this group. I like the work, I just can't stand the management and co-workers any longer.