Refactoring Singletons Out of the Code
This morning I've spent a good deal of time factoring out the Singleton nature of the processing out application is doing. Rather than have singletons with thread-safe data structures, it's very simple to make a clear delineation of the loading, processing, and persistence phases of the app, and then apply thread pools where appropriate to work on the data in a parallel fashion without having to lock a single thing. It's a simple processing problem with a queue built into the thread pool doing the work.
Once all the work is done, we can then run the aggregation steps that cut across the data, in a single thread so as not to cause problems there as well. It's not rocket science, and I spent the vast majority of my time trying to unravel the code written for the atomic and immutable classes, but that's partly because of my unfamiliarity with the classes, and partly because it's a complete mess and looks nothing like the native ruby containers.
I'm glad it done - all in a branch and I've sent a pull request for debate. I think it's the right thing to do simply because of the simplification in the code and containers. The data flow is cleaner, and it'll be far clearer where to add in new functionality. It's really much better code.
UPDATE: I decided that I needed to have some sense of real measurement of the difference, so I added in a quick timing of the core of the code - the augmentation and matching. What I found was that the new, atomic and immutable-free way of doing things is significantly faster:
processed 4605 merchants in 30310.0 ms = 6.5820 ms/merchant
for the old, atomic and immutable data way, and:
processed 4605 merchants in 6198.0 ms = 1.3459 ms/merchant
with the cleaner, clearer, no-atomics and no-immutables way. So there's a real difference. True, the total runtime is hardly different, but that's because it's really being dictated by I/O, and we can't do a lot about that.