In the beginning of this project, we created a very simple bash script to run the jobs we needed run in a crontab. It's just a lot easier, I've found, to run things out of a simple bash script than try and put them in the crontab itself. The crontab just looks cleaner, and it's not a real shell, so it's just better all-around.
But as the project got more complex, it was clear that I was beginning to test the limits of what could be easily done in bash. The problem then, was the fact that a vocal contingent of the guys on this project don't really know bash - and have no desire to learn it. Interestingly enough, their argument for using complex and difficult things like meta-programming in ruby is that there's a "floor full of people that understand it". But when bash comes up, it's not even really checked against that same "floor full of people" to see if they know it as well.
It's Code Monkeys, what can you say?
Anyway, as things progressed, I needed to have a way to simply run many jobs at the same time, but ensure that all jobs of a single kind are done before moving on to the next phase of processing. The solution I came up with was pretty straightforward, but not exactly very efficient. The idea was to have a loop and start n background processes, and then wait for all them to finish before continuing the looping and starting more.
This is a pretty simple but it means that the speed with which we can process things is determined by the slowest (or longest running) job in the batch. Therefore, a relatively small number of well placed jobs in the queue can really spread things out.
While this has been OK for a few weeks, we really needed something cleaner, so I came up with this far simpler plan:
function worker {
for i in $list; do
if [ lock($i) ]; then
do_work($i)
fi
done
}
The idea being that if I launch n of these with a simple:
for (( i=0; i<n; i++ )); do
( worker ) &
done
wait
then we'll have these workers running through the list of things to do - each picking up the next available job, and doing it, but skipping those that have been locked by the other workers.
Really pretty slick. The trick was finding out that mkdir is atomic, so it's simple to use that to make the directory tagging the process, and if we are able to make it, then we have to do the work, and if we can't, then someone else is, or has, done the work.
This is super cool!
I was thinking I'd need a queue, or something, and all I really needed was a list and a filesystem. That's sweet. Really. That's one of the coolest things I've seen in a long time.
Interestingly enough, the code is now a lot simpler:
Could not embed GitHub Gist 4178656: Not Found
I still need to test it all, and that will be Monday, but there's no reason to think it won't work, and this way, we have n workers all doing the best they can until all the work that's needed to be done is done, and then everything stops and we can move on to the next phase.
Very cool!