Interfacing to External Systems with Massive Datasets
Today was spent dealing with this external data service that's delivering massive data sets to my ticker plant for it's use. It's basically a bunch of look-up tables - 400,000 rows in all. It's massive, and so there's going to be lots of issues with loading and accessing, but also the time required to get it all in-memory and running.
Details. It's all in the details.
I've got it to the point that it all loads, but there are a few things I'm not really happy about. First and foremost, is the asynchronous loading of the data. I had to do a few tricks to make sure that we didn't immediately reload the data after loading it once. When you place a simple lock on the loader, that's exactly what you can get: a back-up of loads. Not ideal.
Then we're left with the idea that the second call to the loader thinks the load is done because someone else has taken care of it. But it's not really done, it's just being done. So data might not be there. It's a non-trivial problem, and I'm going to have to deal with it sooner or later.
Like I said... it's all in the details.