Archive for the ‘Coding’ Category

Indexes on MongoDB are Really Important

Monday, May 2nd, 2011

MongoDB

After my experience with the utility of indexes for mongoDB, I was not surprised to find that this extended to pretty small doc sets as well. Specifically, I had a lot of configuration data stored in a mongoDB collection as some 300-ish documents. But they were binary, and when I added an index on the appropriate key field of the documents, the access speed dropped like a rock. Once again, mongoDB wasn't storing the starting points of the docs, it was doing a complete table scan, and in this case, it was very expensive because the table it was scanning had a few very large docs.

Amazing.

This has become such an issue, we put code into The Broker so that the first time someone accesses one of these document stores, it ensures there's an index on the key field. It's going to make it impossible to have bad performance because you forgot the index. Sure, it may be calling ensureIndex() more than needed, but it's only once a runtime startup, and the benefit far outweighs the cost.

Just amazing difference.

OPRA 48-line Distribution Going Live!

Monday, May 2nd, 2011

This morning OPRA, the Options Price Reporting Authority, is changing their distribution from a 24-line multicast feed to a 48-line feed to make it a little easier to balance the load and allow for receivers like us to actually do something with the data we get. So this morning when I got in I changed all the configuration data from the 24-line set-up to the new 48-line set-up. In truth, I'd had the configuration for a while, I just didn't update the config files as I didn't need them. Today is the first day I need them, so they'll be live and ready for me.

Progress... there's no stopping it.

Nasty Debugging Problem with Simple Solution

Friday, April 29th, 2011

bug.gif

Today I spent a very long time on a problem that was really quite simple, but very hard to find. The problem was that many of my configuration service calls were timing out - but only after other calls had successfully been sent and received. This degradation of performance was very repeatable, but equally puzzling. It was clear early on that the problem was in the mongoDB - specifically, getting data out of Mongo. This wasn't exactly clear in all cases, but the hints were very much there.

For instance, if my configuration service hit a new single-server configuration mongoDB, everything was acceptable. But if it hit the staging replica set, it timed out. All this was with reads, so there's no chance of the writing coming into play. Very odd, then that a replica set was slower.

We kept digging, and went so far as to turn off the replica set and turn it into a single server. This yielded the same times as the replica sets - which is to say "slow". Maybe it was the hardware? Nope, a new single-server instance on that hardware was fine.

Finally, after several hours, we got to the heart of the matter: my configuration service was hitting the authorization mongoDB for the auth token to make sure the user was allowed to hit the configuration data. Bingo! We had a 266,000 entry mongoDB table without an index!

All that was needed was to type in the mongo shell:

  db.token.ensureIndex({token:1});

and the times sped up dramatically. This was the key - we didn't look at the data - just the hardware and the software. It was a long day, and while I'm glad we got this one out of the way, it didn't solve my problems 100% as my larger queries are still timing out. David says he's going to look at the the emongo driver this weekend for possible causes. He added the replica sets support to it today as we needed to move away from erlmongo as it inly uses one socket connection to the database. emongo allows for connection pools, which is going to help me a lot.

Google Chrome dev 12.0.742.12 is Out

Friday, April 29th, 2011

Well, the Google Chrome guys are still putting the UI polish on 12.x, as they just released 12.0.742.12 with the release notes saying it's just UI issues and a few sync issues. It really appears that they are going for a stabilized release of 12.x for beta as I've read that they believe 11.x is headed for release. Time marches on...

It’s Amazing to Me What’s Considered Necessary at Times

Thursday, April 28th, 2011

So I've been working on this little service for my greek engine - it's not a major component, but its' something that finds use in the Shop, so I was replicating it's functionality in the new codebase. One of the things that the legacy messages had was the OPRA Message Type for the trade messages. This is a one-character field that says what kind of trade this message describes. Is it a cancel? Is it electronic? There are a lot of meta-data you could have about a trade, but typically, you want to take it out of the exchange-specific realm, and put it into bit flags, etc. Make it source-independent.

Which I had.

Then I came upon this legacy message and saw that it had this OPRA message type as a field. I asked around, and was told that I needed to have that in the message. That's odd. Very odd. This means that every app will have to have the same logic for what this "means" to the trade. This doesn't make a lot of sense at all. In fact, I think it's silly.

But it's a requirement, so it's in. Silly. Totally silly.

UPDATE: after another meeting, it was the consensus that this wasn't such a hot idea, and that we should try to live without it. OK with me. Simple git revert.

MongoDB Replica Sets Issues

Thursday, April 28th, 2011

MongoDB

This morning I started to see some disturbing problems with the configuration service written in erlang for The Broker. It's all backed by a mongoDB that's currently configured as a replica set, and after a few apps were up, the speed took such a hit as to start to time out my requests. I wasn't sure what it was, so I took the advice I was given, downloaded the latest pre-built binaries and ran a stand-alone install on one of my boxes.

It was really pretty amazingly easy. You unzip the tarball and just run it. I made a simple directory to put all the data in, and away it went. Very nice. I was able to reconfigure my Broker code to hit this database for the three Brokers I had in my little dev cluster.

Then I ran the app.

Very nice response times. Very. I let it run for an hour or so, accumulating data and saving it to the new mongoDB. Then I stopped everything, and restarted it. Rather than hanging, as the replica set did, it started up with a little slowness, but everything worked. It's not mongoDB, and it's not the way I was using it. At least not in a single server mode.

Someone did a little digging and found that the 1.8.1 release might have released a bug in the replica set negotiation. So we're going to get the "final release" source and put it on the boxes and see if that doesn't help. But we need something. As it is now, replica sets are really not going to scale like we need.

There Really is No Substitute for Documentation

Wednesday, April 27th, 2011

This afternoon I'm onto another problem with The Broker, and this time it's really difficult to figure out because there have been a lot of changes made to the codebase, and none of them are documented in the least. The problems include the immeditate unregistration of services after they have been registered, as well as not accurately identifying those services that aren't available to the client.

I think I was getting close to the answer, but the erlang code is just too functional, and it's hard to know where something is called from if you don't have a complete stack trace. In this case, I don't - or at least don't recognize it if I do. I'm about a 5 or 6 out of 10 in erlang, and that's not enough to really be able to dig all this out of the code without some form of documentation to help me know what role each module plays in the overall scheme of things.

In the end, I was able to document what I saw, what I think the problems were, and how I'd go about fixing them, and sent that off to the guy who wrote all the code and is far far better at erlang than I am. I'll have to wait and see for tomorrow when he returns.

Getting Going on Time and Sales Service (cont.)

Wednesday, April 27th, 2011

This morning I finished up on a nice simplification that I saw in the Time and Sales service: the temporary structure I was using to hold the data from the Option and the Print was really a new message - the OptionPrint message. This is exactly what we need to send out to the new clients, and a version of this can easily be made to send out to the legacy clients.

So I gutted all the code in the service and made the new message in the old messaging codebase, and then retrofitted it into the new service. In all, it took me about half a day - finishing up this morning. But it's worth it.

Now, all the pieces fit again - if we wire up a transmitter to the service, it'll automatically send the messages out the proper channel - be that legacy or new. It also makes it very memory-friendly as the same structure that's hold the temp data is the message we'll be sending out. That means there's no conversion to a message - we just ship what we have.

It takes no more to create, and it's far more efficient to use. Sounds like a win to me.

Got Nailed Again by Infrastructure Changes

Wednesday, April 27th, 2011

This morning for the second time in about as many weeks, another group in The Shop decided to update their Mongo database, and it's going to cost me most of the day to fix my code because of the change. They say their java client to mongo was not allowing them to use larger than 4MB documents, but I've been storing 16MB+ docs in the database with the erlang driver. But they didn't ask me before they did it. They just told me they were doing it.

In going from 1.6 to 1.8, it turns out that there's now a hard and fast rule about docs being less than 16MB - so I get messed over. I am going to have to look at how all my documents are created and make sure that no one gets too big because the failure will hang my process. It's ugly - really ugly.

I think when I'm done with this, I'll try to put something into the configuration service that errors out if you send it a payload that's more than 16MB. This way, it can't let you save something too big, and erroring out is better than locking up every time.

Still... it's these "detours" that are really getting annoying - and quite avoidable. They could put up a dev environment and we can test, and then work out solutions before doing it to the staging install. That's what they should be doing. I've no reason to believe they will, however.

Google Chrome dev 12.0.742.9 is Out – Release Candidate?

Tuesday, April 26th, 2011

Seems the Google Chrome guys are busy - today it's 12.0.742.9 with the release notes calling it a release candidate. Interesting. If they are planning on moving the 12.x branch to 'beta', then they'll soon be bumping the 'dev' channel to 13.x - which is an interesting number to say the least. I wonder if anyone is superstitious? In any case, they are trying to polish up this version for the release, and that's always good news as it means a better experience for everyone.