Michael Barton at Bioinformatics Zen recently posted an interesting article entitled “why write good software“. He makes an important point, but I think he underplays the importance of testing somewhat.
I used to code (as a lot of bioinformatics people do) like this: write a huge pile of code, run it, see if it works. If it crashes, spend hours debugging. If not, see if the results look ‘kinda right’. Now obviously unit testing reduces (almost eliminates) those tedious hours of debugging, but more importantly, it also gives you some assurance that the results you’ve submitted for publication are more likely to be correct.
I’ve suffered the embarrassment of publishing a paper with incorrect scores in (thankfully only half a percent) due to a bug not even in my algorithm itself but in the scoring routines I’d written to benchmark it. If I’d grokked unit testing at that point, this probably wouldn’t have happened. Who knows how many published papers do have incorrect scores in them because of undiscovered bugs in the code — little things like off-by-one errors or rounding errors that you wouldn’t even notice, if inspecting the output of an algorithm by eye.
Michael says that “statistical tests for significance outweigh software testing for reliability” in the eyes of bioinformaticians, but there are important parallels, and one of the parallels is that coding bugs can subtlely shaft your results just as surely as arithmetic errors or picking the wrong statistical test. Fortunately in software development, there are automated ways to catch such bugs before you publish, and to make sure that any changes you make later don’t introduce new ones. Test, test, test some more.
Andrew.
{ 2 } Comments
Hi Andrew
Was the mistake in the paper noticed by yourself or someone else? I ask because unit testing is not rewarded in science. The lack of unit testing, or related measures, is penalised only if mistakes are found. How often do bugs significantly affect scientific resuts? If so, how often are these mistakes spotted. If mistakes resulting from code errors are common, but rarely found then the “publication” advantage lies with producing results in the method you describe above.
I’m in agreement with your point on proper unit testing in scientific software, but thorough software development takes time. Time which other researchers can spend on getting results published and moving on to the next research project. I think unit testing needs somekind of recognition in developing sound scientific software, in the same way that positive and negative controls are used in the wet lab.
I’d be interested in your thoughts related to this as I spend a fair amount of time unit testing my own software.
Mike
It was noticed by a reader! All the numbers in a table ended in .0 or .5 and someone emailed and said “rounding error maybe?” Which indeed it was.
You’re right, testing isn’t rewarded directly, but then doing the maths/stats right (for example) isn’t either. And you can rush that and get the results out slightly quicker and end up with dodgy numbers too. But people don’t (or shouldn’t ;-) ) because there’s more of a ‘culture of rigour’ in that regard.
I suppose unit testing’s only (fairly) recently caught on software development generally, with the advent of tools that make it quicker. And it takes time to percolate through to bioinformatics, actually not just time, the efforts of people like you and me too…
Post a Comment