Cleaning Up Things and Creating a Streaming Iterator

GeneralDev.jpg

Today I spent quite a bit of time tweaking the codebase to make sure that there wasn't something I was doing that could be done more efficiently with a little work. I've been working very hard on getting things done in the ticker plant, and it's client, that I haven't had a lot of time for the make it faster part of Make it work then make it faster mantra, so today I wanted to take some time to try and see what I could do to get anything more out of the system.

I also got a few requests from the new users, and I added those changes into the system, and I even actually like one of them. Based on the person that asked for it, I was surprised that I liked anything he had to say, but that's personal. Professionally, he's a dud too, but that's a different story, no?

After I was pretty much done with that, he came back to ask about something else, and I told him the API I had implemented for him - but hadn't finished to this point. It's some of the details that I knew I'd have to come back to, and it just hadn't been the right time. Clearly now, was that right time.

What I found in digging into the source of the data I needed was that there's no way for me to handle it by conventional means. It's data stream is about 55 MB, and that needs to be deserialized into a map of maps representation of about 800,000 elements. Way too big. All I need is the name-value pairs in the data, and I don't need to blow it all into a map-of-maps and then walk the maps to get the pairs. I can do that in a more streaming manner.

So I started to think about that and came up with an absolutely wonderful idea - I'll make an iterator for the data. It'll look and act like a standard STL/boost iterator where you'll initialize it with the data stream, and then 'increment' your way to the end. Each 'step' will have all the data you need for that step, and then the trick will be that the processing of the data will be up to me, the caller.

This will work wonderfully, as I won't have to deserialize all the data at once and I won't even have to spend the time to deserialize all the data at once. I can hold onto the data stream and deserialize what I need. If the user breaks out of the iterating loop I stop decoding. Simple.

I spent the time to get most of the header file done, and tomorrow I'll need to implement it. But I'm very excited about this as it'll fit very nicely into the scheme of things and really help what I'm trying to do.

Cool stuff.