Struggling With Incomplete or Bad Unit Tests

Unit Testing

This afternoon it's been tough as Jeff and I have been doing some pair work on one component of the system we're building at The Shop. It's a small ruby app, and the Ruby Way is to have lots and tests, but the problem is it's impossible to have complete test coverage, and bad tests are a nightmare. For example, we're trying to make the code functional in nature - immutable objects, queues and threads to balance the load, and that kind of stuff. All good goals, to be sure, and worthy of the effort. But as I've always said, there's no such thing as a complete test suite, and the ones I've seen are just complex enough to make the addition of a simple, single feature so daunting that it almost defies inclusion.

There was a Java Swing app, for instance, that every time I added a new attribute to the system I had to spend more than 5x the time of adding the feature to updating the tests. This is not bad, if it's only 10 sec to add a feature, but when it's an hour to add a feature, and 5 hours of work to update the tests, it gets out of hand. And since there's no system that analyzes the code and generates the test code, the tests are just as fallible as the code itself.

After all, who's writing the tests?

Do we need tests on the tests? Meta-Tests?

It can get out of hand very soon. And if it's just a simple set of sanity checks, then that's one thing, but when it includes end-to-end tests and complex integration tests, it's going to get out of hand very quickly.

Such is the case, I fear, for the project I'm on now.

Today we were trying to figure out why the continuous-integration server was failing on the tests. All the tests ran just fine on our dev machines, but on the CI server, they failed. All the time. Why?

Jeff and I started digging into this and found that the CI server was running the rspec tests in a specific order - as opposed to letting rspec run them as it saw fit out of the complete directory. We suspected the order of the tests was the problem, and sure enough it was. This is clearly hysteresis, or test pollution at work. Something in one of these tests was setting state that wasn't getting cleared properly, and then another test was coming in and it was failing. Reverse the order and both tests worked just fine.

So what was it?

We spent about an hour at this - getting it down to about 10 lines of code split across two methods. This is the flip-side of ruby… 10 lines is a method, but 10 lines is an hour long mystery. Finally, I realized that one test was capturing stdout and the other was using it, and if the non-capture went first, then it's singleton was set up for non-capture on stdout, and the test failed. Reverse them, and all was well.

Singletons. Not great for functional coding because of just this kind of stuff. Also, spec should have some way of resetting the complete environment for each spec (test) file. That would be far preferable, but I can see why they do it this way - write good code, and you don't have this problem, but it allows you to have tests that "build" on one another. Makes sense.

So tests - not the code itself, was the reason for a half day of work. This is nasty. Very nasty in my book. Maybe I'm still decompressing from the world of finance, but that's a long time to be stuck on something that adds no value to the project. Still… I have to be patient and learn why they do things this way, as there's bound to be some reason. Gotta be.