Coalescing Queues and the Myth of Real-Time Data

MarketData.jpg

It's interesting working with developers and users when it comes to market data. They are often convinced that they need real-time data for prices and calculations driven off prices and they don't really stop to think Why? I do that a lot, and have come to the conclusion that most, and by 'most' I mean anything involving a human being in the loop does not need real-time data. Period. Here's why.

First, there's the calculations. Most applications aren't simple tickers - those are the trading apps, and they need prices that are as close to the market as possible, but then again, if you're trying to watch a few hundred symbols, the odds that you have a powerful enough desktop machine to actually keep up with the ticks from a data source like Reuters is iffy at best. You need it to be as close to the market as possible while not crushing your machine and making it virtually a single-use terminal for data.

So if you have calculations, like exposure, or running P/L, and it's aggregated in any way, then there's very little chance that you have an efficient enough system to actually handle all the ticks that a real-time price feed can dish out. Getting backed-up isn't the answer because then you're behind the market and still have to play catch-up. Nope... you need to be intelligent about what you do.

Secondly, even if you could keep up with the flow, the human watching isn't going to be able to respond to all the ticks individually - heck, it takes us 0.7 sec to hit the brake in an accident situation, there's no way someone if going to respond to a tick for a very liquid stock twice (or more) a second. No way. Automated trading systems are a different beast, but they don't have a human in the loop.

So... the reality of the situation is that for price data feeds you really need a good set of prices. Something that's very close to the market, say less than 3 sec, but not real-time because that's too much. Problem is developers want to have real-time systems because they sound neat. Yeah... I can see that, but it's not reasonable, and when you try and tell them this they aren't at all interested. Rather, they tell you that they can make it happen... and they've thought it all through and have all it takes to do it. This is must likely when the buzzwords come out with newfangled messaging systems to boot.

So you have to back up and explain the realities of these feeds to them. It takes about 30 to 45 mins to get through to most decent developers, and then they start to see the real scale of the problem. Statements like But I'm only registering for 400 symbols turns into "Yes, but you're registering for the most liquid 400 symbols that is going to be a significant real time load." Oh... I didn't think of that. Yes... I know.

Once it's explained, you end up with the standard market data 'bet': Let's try it my way, OK? It's already built, debugged, and ready to go. If this isn't good enough for you and your customers, then we'll do it the other way, OK? In all my experience I've never had to come up with the 'other' way of doing things.

But today I thought 'Why even get into it? Make something that appears to be streaming, even if it's not?' and so I did. I started by creating a nifty set of coalescing queues (FIFO and LIFO) where the push() method takes a key and a value. The key is the primary identifier and the property on which the coalescing will take place. For prices, this is the name of the ticker, but for other things it could be the address, or primary key from a database. The idea is that the order of the queue will be preserved but if you push() a value onto the queue that's already there (as defined by the key) then the value will be replaced, but the order in the queue will be maintained. This means that if you're using this for market data, the prices will keep updating even when you're not servicing the queue, so that when you do service the queue, the order is maintained but the data is the most recent data possible.

I put this into the client code for my price server and then made it possible for users to subscribe for prices and then 'turn on' the delivery of updates and simply "watch the queue" for updates. The queue has all the thread-safety and conditional code in it to make it very easy to simply ask for something from the queue and then as soon as something is ready, it's returned and you can process it and start at the top of the loop again. It's easy to put this in a simple service thread that does nothing but pick things off this queue as they arrive.

It's the illusion of real-time without the headaches. I let the users think they are getting real-time prices and not polling when they really don't know the mechanism that's getting those prices into the queue in the first place. Additionally, they aren't having to deal with identical prices and filtering them out - I do that before I put the prices on the queue in the first place. What it does is really short-cut the argument a bit by saying "try this, and let me know" - and then not hearing from them ever again.

Beautiful solution. I love it.