Archive for the ‘Coding’ Category

Hammering Away at Integration Code

Tuesday, May 3rd, 2011

Today was spent in a few meetings, and helping get a few developers on the right track, but when I had the time to do a little coding, I was working on getting everything glued together for the greek engine. Specifically, the first part of the engine will be a Broker service that will do all the what if scenarios for the users. This is something that the folks have wanted for a long time, and it makes a lot of sense to deliver this first, with the large-scale, center-state, reliable multicast, greek engine put together from these same components once we get this guy up and running so we know how much hardware to spend on the problem.

This phase of the project is nice in that it appears to be making a lot of progress in a very short time, but in reality, it points out where you made mistakes in the design up to this point. If you have a good design, then it indeed goes together pretty quickly. But if you missed something, then you have to go back and fix or retro-fit that, and then put things together.

Interestingly enough, I had a little of both today. My general Source and Sink objects needed to have locks provided on their lists, but that's because I was starting to violate the assumptions that had been made in their original design. I could go back and implement a lockless list, but that's a little overkill, I think at this point. If we need it, I know how to do it, I just need a little time to actually do it.

It's not done by any means, but it's getting there. Slowly, but surely.

Google Chrome dev 12.0.742.16 is Out – More of the Same

Tuesday, May 3rd, 2011

This morning the Google Chrome team released 12.0.742.16 with what it calls "UI and performance issues" - which appears to be the theme in recent weeks. It does appear that they are taking the time now to really clean things up and not focus on new features as much as clean up the codebase and polish the app as much as possible. Good enough. Sounds like a plan.

Starting to Pull it All Together

Monday, May 2nd, 2011

This afternoon I've been helping the team get things straight for the next few days, and starting to pull things together in how we'll present this to the clients for their consumption. One way would be to have another ZeroMQ PUB/SUB reliable multicast system, but that's really overkill. What we really want is to have a point-to-point, one-on-one conversation with a client so that they can tell us what they want, and we can supply it for them.

Things are starting to take shape in my mind, but it's always better if I noodle on them for a while.

Indexes on MongoDB are Really Important

Monday, May 2nd, 2011

MongoDB

After my experience with the utility of indexes for mongoDB, I was not surprised to find that this extended to pretty small doc sets as well. Specifically, I had a lot of configuration data stored in a mongoDB collection as some 300-ish documents. But they were binary, and when I added an index on the appropriate key field of the documents, the access speed dropped like a rock. Once again, mongoDB wasn't storing the starting points of the docs, it was doing a complete table scan, and in this case, it was very expensive because the table it was scanning had a few very large docs.

Amazing.

This has become such an issue, we put code into The Broker so that the first time someone accesses one of these document stores, it ensures there's an index on the key field. It's going to make it impossible to have bad performance because you forgot the index. Sure, it may be calling ensureIndex() more than needed, but it's only once a runtime startup, and the benefit far outweighs the cost.

Just amazing difference.

OPRA 48-line Distribution Going Live!

Monday, May 2nd, 2011

This morning OPRA, the Options Price Reporting Authority, is changing their distribution from a 24-line multicast feed to a 48-line feed to make it a little easier to balance the load and allow for receivers like us to actually do something with the data we get. So this morning when I got in I changed all the configuration data from the 24-line set-up to the new 48-line set-up. In truth, I'd had the configuration for a while, I just didn't update the config files as I didn't need them. Today is the first day I need them, so they'll be live and ready for me.

Progress... there's no stopping it.

Nasty Debugging Problem with Simple Solution

Friday, April 29th, 2011

bug.gif

Today I spent a very long time on a problem that was really quite simple, but very hard to find. The problem was that many of my configuration service calls were timing out - but only after other calls had successfully been sent and received. This degradation of performance was very repeatable, but equally puzzling. It was clear early on that the problem was in the mongoDB - specifically, getting data out of Mongo. This wasn't exactly clear in all cases, but the hints were very much there.

For instance, if my configuration service hit a new single-server configuration mongoDB, everything was acceptable. But if it hit the staging replica set, it timed out. All this was with reads, so there's no chance of the writing coming into play. Very odd, then that a replica set was slower.

We kept digging, and went so far as to turn off the replica set and turn it into a single server. This yielded the same times as the replica sets - which is to say "slow". Maybe it was the hardware? Nope, a new single-server instance on that hardware was fine.

Finally, after several hours, we got to the heart of the matter: my configuration service was hitting the authorization mongoDB for the auth token to make sure the user was allowed to hit the configuration data. Bingo! We had a 266,000 entry mongoDB table without an index!

All that was needed was to type in the mongo shell:

  db.token.ensureIndex({token:1});

and the times sped up dramatically. This was the key - we didn't look at the data - just the hardware and the software. It was a long day, and while I'm glad we got this one out of the way, it didn't solve my problems 100% as my larger queries are still timing out. David says he's going to look at the the emongo driver this weekend for possible causes. He added the replica sets support to it today as we needed to move away from erlmongo as it inly uses one socket connection to the database. emongo allows for connection pools, which is going to help me a lot.

Google Chrome dev 12.0.742.12 is Out

Friday, April 29th, 2011

Well, the Google Chrome guys are still putting the UI polish on 12.x, as they just released 12.0.742.12 with the release notes saying it's just UI issues and a few sync issues. It really appears that they are going for a stabilized release of 12.x for beta as I've read that they believe 11.x is headed for release. Time marches on...

It’s Amazing to Me What’s Considered Necessary at Times

Thursday, April 28th, 2011

So I've been working on this little service for my greek engine - it's not a major component, but its' something that finds use in the Shop, so I was replicating it's functionality in the new codebase. One of the things that the legacy messages had was the OPRA Message Type for the trade messages. This is a one-character field that says what kind of trade this message describes. Is it a cancel? Is it electronic? There are a lot of meta-data you could have about a trade, but typically, you want to take it out of the exchange-specific realm, and put it into bit flags, etc. Make it source-independent.

Which I had.

Then I came upon this legacy message and saw that it had this OPRA message type as a field. I asked around, and was told that I needed to have that in the message. That's odd. Very odd. This means that every app will have to have the same logic for what this "means" to the trade. This doesn't make a lot of sense at all. In fact, I think it's silly.

But it's a requirement, so it's in. Silly. Totally silly.

UPDATE: after another meeting, it was the consensus that this wasn't such a hot idea, and that we should try to live without it. OK with me. Simple git revert.

MongoDB Replica Sets Issues

Thursday, April 28th, 2011

MongoDB

This morning I started to see some disturbing problems with the configuration service written in erlang for The Broker. It's all backed by a mongoDB that's currently configured as a replica set, and after a few apps were up, the speed took such a hit as to start to time out my requests. I wasn't sure what it was, so I took the advice I was given, downloaded the latest pre-built binaries and ran a stand-alone install on one of my boxes.

It was really pretty amazingly easy. You unzip the tarball and just run it. I made a simple directory to put all the data in, and away it went. Very nice. I was able to reconfigure my Broker code to hit this database for the three Brokers I had in my little dev cluster.

Then I ran the app.

Very nice response times. Very. I let it run for an hour or so, accumulating data and saving it to the new mongoDB. Then I stopped everything, and restarted it. Rather than hanging, as the replica set did, it started up with a little slowness, but everything worked. It's not mongoDB, and it's not the way I was using it. At least not in a single server mode.

Someone did a little digging and found that the 1.8.1 release might have released a bug in the replica set negotiation. So we're going to get the "final release" source and put it on the boxes and see if that doesn't help. But we need something. As it is now, replica sets are really not going to scale like we need.

There Really is No Substitute for Documentation

Wednesday, April 27th, 2011

This afternoon I'm onto another problem with The Broker, and this time it's really difficult to figure out because there have been a lot of changes made to the codebase, and none of them are documented in the least. The problems include the immeditate unregistration of services after they have been registered, as well as not accurately identifying those services that aren't available to the client.

I think I was getting close to the answer, but the erlang code is just too functional, and it's hard to know where something is called from if you don't have a complete stack trace. In this case, I don't - or at least don't recognize it if I do. I'm about a 5 or 6 out of 10 in erlang, and that's not enough to really be able to dig all this out of the code without some form of documentation to help me know what role each module plays in the overall scheme of things.

In the end, I was able to document what I saw, what I think the problems were, and how I'd go about fixing them, and sent that off to the guy who wrote all the code and is far far better at erlang than I am. I'll have to wait and see for tomorrow when he returns.