Looking at Experimental Data
Friday, September 28th, 2007Today I've spent a good deal of time working with a trader and a little app I wrote for their desk to pull some data from a service in a format that they can read into their applications. It's the kind of thing that I'll put a day or so into and they'll use it for a long time without modification simply because it's a data access component to them. No problem - in theory.
Yesterday, I worked on the socket communications for the clients of my market data server to change the way packets of data were read back from the server into the client's space. Rather than read a 2kB packet and then process it, I changed the code to read everything that was available at the socket, if anything was available. This meant that if we were receiving a 50kB message, we didn't read it in 25 chunks, we read it in one large chunk and then processed that. This made the processing of the data much faster because we didn't re-scan the first 2kB 25 times - we scanned it once. Now a 59kB packet is one thing, but some packets will be 1MB or more. Now we're talking significant savings in time.
Well... today I had to deal with someone that was not convinced that this was faster. In fact, they were convinced that it was slower. Given that they don't know the code, and only see that something has changed, I can understand their need for some kind of assurance that things have changed for the better. So what I did was to run two sets of trials: old versus new, five runs each, same data set to see what we'd get. Ideally, this will be a large enough sample set to be able to factor out the small (or large) variations in the access speed, network traffic, and other variables that you run into on large computer networked applications.
What I found out was that the time to gather the data from the source was somewhat variable, but the time it took to process the data once I had it was pretty controllable. The old way had times for this experiment from 43.2 to 43.7 sec - a pretty nice grouping, and the new way had times from 6.5 to 7.0 sec - again, a nice grouping. While the access times to get the data were much more variable, I made sure to include the access time as well as the processing time so that we could easily see where the time was spent.
Having spent more than a little time with experimental evidence myself, this was a nice sample size, and the breakout of the data made it clear where the variation was, and wasn't. Unfortunately, for this person, the data didn't say the same thing. They didn't see anything like that in the data. They saw the variation in the total time and said "See, the new is slower than the old here, so your changes hurt the system." When I tried to explain that if you looked at the breakout of the times, it was clear that the difference was the time spent in getting the data from the source (not in my control) and the processing time was nearly constant. But there was no convincing them.
For several hours I fought through this - more trials, asking what would convince them, etc. All the while, I'm thinking that these folks are extremely arrogant. It was only after I stepped back for a bit and looked at what they were saying that it hit me - they aren't arrogant, they're just horrible scientists.
Since their background is advanced degrees in science, I made the poor assumption that they actually were decent at reading experimental data and figuring out what the data is saying to them. After all, the job they have deals with experimental data every single day - it's called prices. Stochastic processes abound in this field, and it should have been second nature to these folks to read experimental data, but it's not. So me sense of frustration with their arrogance quickly turned into sadness about their lack of fundamental skills in this area of their work. Sad but true, I can't imagine trusting them with a dime of my money if they can't read experimental data like this.
As is so often the case, we carry in expectations to relationships that are sometimes far below the mark, and sometimes far above. It's all about really getting to know where the other folks are coming from and where their skills and weaknesses are. Now that I've properly calibrated myself to these folks, it'll be much easier to deal with them in the future.