Finally Realizing One Size Never Fits All
I originally designed my ticker plants to fit a specific client: the systems feeding the human traders. Eyeballs. There was no need to have everything up-to-date every millisecond - the human eye can't tell, and the systems don't update faster than a few times a second. It's just a waste. But what they do care about is that when they see the change, it's the latest data available. This means don't queue it up! You have to remember the order the ticks came in, but allow for updates to the data to replace the old with the new. This is commonly called conflation. It's a good thing for systems delivering data to humans.
But automated trading systems don't want this. They want every tick. They want it all as fast as possible. It's understandable - if you can make a machine able to see everything, then you have a much better chance of seeing opportunity and therefore making a profit. While I didn't design my ticker plants for these kinds of systems, several months ago, I was asked to make it work for these kinds of systems.
I've spent a lot of time trying to speed things up so that one system is capable of meeting the needs of both kinds of clients. It's been very difficult, and in a very real sense, what I've been doing is dumbing down my system to force the clients to handle every tick. If I could have done it, it would have been fantastic. But it really isn't possible. The compromises for one client are just too far from the compromises for the other.
So I finally had another little Ah! Ha! moment - Stop trying to make one size fit all. Elementary, but true, and an important understanding of really making something good for everyone.
If I made my ticker plants the way I started - for the 'slow' trading, and then had the 'fast' trading use an embedded ticker plant, then those that needed speed wouldn't even have to deal with a network hop. That's good. No serialization or deserialization. No worries about dropping packets from the server to the client. There are a lot of things that just "go away" when you decode and use the data in the same process.
I do this in my NBBO server - I have n exchange feeds all going into one NBBOEngine, and then sending it out to the clients. I don't take in the feed, process it, and then send it out - that'd take too long. I process the feed within the process space of the consuming application.
The resources to do this aren't horrible, two threads, less than a core and some memory. All this can be dealt with very easily by adding a box or two, if necessary. These boxes could be the "servers" you turned off because you no longer need them. In any case, it's a very solvable problem.
In the end, those that need conflation get it, and those that don't want it, get the data in-process as fast as possible. It's really the best of both worlds as it doesn't make compromises for one client or another.