What's it all about, Alfie?

Conducting Timing Tests

Speed

So this morning I'm doing some timing tests to see what the effect is of trying to run multiple jobs at the same time. Initially, we thought that the boxes would be CPU-bound because of all the logic in the pinning and calculations. But that's not the case. I can use top to see that the vast majority of the time the CPUs are idle (2 CPUs in this box).

So what's taking so long?

The monitoring (NewRelic) says it's all I/O - seems reasonable, but then can we speed up the time to make all six runs by putting them all in the run queue at the same time? Maybe it's I/O choked, and if that's the case, then No, it won't help. But if we're latency bound, then it will - so long as the service on the other end are fast enough to handle multiple requests.

It's a lot of questions, so that's why I'm running the tests.

Timing of a particular run is easy - just use time. But if I want to parallel the runs and see the effect in the same manner, I need to be a little more clever:

  #!/bin/bash
  philadelphia &
  central-jersey &
  baltimore &
 
  wait

This is a great little script that runs all three in parallel, and then waits for them all to be done before returning. If I put this into a time call:

  $ time ./parallel

then I'll get exactly what I want to get.

Now it's just a matter of doing all the runs and gathering all the data.

Division	Total	real	user	sys
philadelphia	414.68	3:13.863	2:02.340	0:08.889
central-jersey	109.00	2:17.305	1:29.034	0:05.768
baltimore	214.04	4:03.828	1:53.211	0:07.500
cincinnati	121.80	3:09.080	1:33.894	0:06.716
cleveland	264.36	2:56.955	1:48.855	0:07.872
austin	194.46	4:45.738	2:06.456	0:08.581

OK, this is all a nice baseline, but what we're really interested in is how they work together. In that case, we ran successively larger sets, starting at the top of the list and running more jobs and timing them:

Job Count	real	user	sys
2	7:37.395	4:22.084	0:16.545
3	7:40.657	6:45.033	0:24.438
4	12:14.561	8:05.870	1:54.751
5	16:09.544	12:06.461	0:31.694
6	18:53.886	14:59.040	0:37.562

This looks puzzling, so I'm going to do similar tests - this time serially running the first 2, 3, 4, 5, and 6 jobs and see how they compare:

Job Count	real	user	sys
2	9:00.463	4:10.892	0:16.337
3	12:21.413	5:59.158	0:23.101
4	15:49.010	7:32.160	0:27.686
5	20:13.591	9:39.248	0:37.150
6	26:44.176	11:47.096	0:44.891

All this data looks like:

Parallel Tests

Which tells me that it doesn't matter, really, if we try to run these in parallel. They simply don't scale that way. There's some other limiting factor in the system and it's not the CPU. This isn't terribly surprising, but it's something that means we really need to start looking at these external systems and see where we can limit our exposure to them.

UPDATE: it took 4:10 - just over four hours to do all 40 of the divisions that we need to do nightly. I guess in that light, it's OK, as there were plenty of retries in the run and we still had time to do the Top 40. The problem, I fear, is that we still have a global choke point that we need to work through, but we'll run into that soon enough, I guess.

This entry was posted on Friday, October 12th, 2012 at 8:01 am and is filed under Coding, Cube Life. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Conducting Timing Tests

Pages

Archives

Categories