biotext.org.uk

Rants

On the importance of testing in research software

by Andrew on Feb.04, 2009, under Rants

Michael Barton at Bioinformatics Zen recently posted an interesting article entitled “why write good software“. He makes an important point, but I think he underplays the importance of testing somewhat.

I used to code (as a lot of bioinformatics people do) like this: write a huge pile of code, run it, see if it works. If it crashes, spend hours debugging. If not, see if the results look ‘kinda right’. Now obviously unit testing reduces (almost eliminates) those tedious hours of debugging, but more importantly, it also gives you some assurance that the results you’ve submitted for publication are more likely to be correct.

I’ve suffered the embarrassment of publishing a paper with incorrect scores in (thankfully only half a percent) due to a bug not even in my algorithm itself but in the scoring routines I’d written to benchmark it. If I’d grokked unit testing at that point, this probably wouldn’t have happened. Who knows how many published papers do have incorrect scores in them because of undiscovered bugs in the code — little things like off-by-one errors or rounding errors that you wouldn’t even notice, if inspecting the output of an algorithm by eye.

Michael says that “statistical tests for significance outweigh software testing for reliability” in the eyes of bioinformaticians, but there are important parallels, and one of the parallels is that coding bugs can subtlely shaft your results just as surely as arithmetic errors or picking the wrong statistical test. Fortunately in software development, there are automated ways to catch such bugs before you publish, and to make sure that any changes you make later don’t introduce new ones. Test, test, test some more.

Andrew.

2 Comments :, more...

What’s wrong with peer review

by Andrew on Jan.09, 2009, under Rants

Michael Nielsen has posted a longish article entitled “Three myths about scientific peer review”. It’s thought-provoking reading and will strike a chord with most researchers. He uses various examples from 20th-century science and before to question our assumptions about how the system works (and how well). There’s apparently a follow-up about the future of peer review, and a book, on the way.

Interestingly, many scientists will be happy to use the rigourous nature of peer review to defend science against its critics, or to demarcate ‘real’ science from its fringe elements and impersonators, yet almost all will also have stories of the peer review system letting them down, or not being all it’s cracked up to be.¹ Perhaps it’s like adversarial party democracy — the least bad of all the current alternatives.

Andrew.

¹ N.B. I have no figures whatsoever to back up these claims.

Leave a Comment :, more...

SOAP vs REST — a common misconception?

by Andrew on Jan.03, 2009, under Rants

Update (1-Nov-09): Since writing this post nearly a year ago, I’ve come to realise that I was labouring under quite a few misconceptions myself, about REST. Spotting them all is left as an exercise for the reader. But I’d rather leave it here with this caveat, than remove it and pretend I knew better all along…

Michael Little at Fliquid Studios recently posted an interesting comparison of SOAP vs REST which may have accidentally perpetuated a misconception about SOAP that I believe is fairly common. This post is a response to that and a expanded version of a comment I left on Michael’s site.

The confusion arises from the difference between the communication layer (SOAP) and the databinding layer (aka marshalling/unmarshalling) which is responsible for mediating between language-native data structures and XML. Michael is writing from a PHP point of view, and I’m not sure what PHP SOAP toolkits are like in this regard, but fortunately in the Java world the distinction between the two is made clear, and we have various options for each — although JAX-WS for SOAP and JAXB for databinding are the industry standard specifications.

(continue reading…)

3 Comments :, , more...

What’s wrong with Hibernate, #4

by Andrew on Dec.19, 2008, under Rants

Hibernate is supposed to allow you to write queries and manipulate data in the normal Java idiom. Which is true up to a point, and that point is almost five years in the past, when Java introduced generics.

Generics are absolutely standard practice in Java these days, and have been for two (nearly three) versions. But Hibernate still doesn’t support them, despite lead developer Gavin King saying at the time:

We will also need to support Java generics, which basically boils down to allowing typesafe collections (which is very trivial).

Nearly five years later, this still hasn’t happened, and you still need to manually cast the results of queries and liberally add @SuppressWarnings("unchecked") to demonstrate that you’re aware you’re doing things a bit wrong.

Andrew.

There are several other posts in this series. Please read the disclaimer before you write an angry reply.

10 Comments :, more...

What’s wrong with Hibernate, #3

by Andrew on Dec.19, 2008, under Rants

Unfortunately, open-source projects above a certain size seem to become victims of their own success.

Many other excellent OSS products like Guice or CXF have user-centred mailing lists that the developers also read. These developers are generally very willing to help out with problems, and the users — having been treated kindly when they started out — are equally keen to help with the areas that they have experience of.

Hibernate has no user mailing list, just a large forum site that’s easy to ignore. Mailing lists encourage people to respond because queries drop in their inboxes; forums tend to be visited in times of need, and people are probably less likely to drop by for philanthropic reasons.

As an example, I’ve posted three queries (plus one repost and various followups) in the Hibernate forums over the course of a month, politely and with plenty of information. Despite hundreds of views, and the fact that two of them highlight features which don’t work as described, not a single reply from anyone else has been forthcoming. Not even a one-liner saying “bad question, RTFM” or “working as designed” or “known bug”.

Maybe this is sour grapes, but the amount of community feedback from other OSS projects puts this to shame.

Andrew.

There are several other posts in this series. Please read the disclaimer before you write an angry reply.

5 Comments :, more...

What’s wrong with Hibernate, #2

by Andrew on Dec.18, 2008, under Rants

On the Hibernate website (and elsewhere), one of the touted advantages of Hibernate over roll-your-own SQL is:

Hibernate Core for Java generates SQL for you, relieves you from manual JDBC result set handling and object conversion, and keeps your application portable to all SQL databases.

Well, not exactly. Many functions in HQL (Hibernate Query Language) are simply passed through verbatim to the underlying database engine, without any modification.

This is normally fine, but on my first ever HQL query I tried to use a natural-logarithms function — which is log() in HSQLDB (my testing server) and ln() in PostgreSQL (my production server). Which means that queries written to target the production environment fail with a charming NullPointerException in the test environment. Half a day’s debugging, right there.

More worryingly, imagine if I had innocently written the query using log(). All my tests would have passed. Then I would have deployed the application to the production environment, and it would still have worked, but all the queries would have happily returned the wrong answers — because log() in PostgreSQL means base-10 logarithms.

Hibernate not only fails to insulate you from dialect differences like these, it also introduces a false sense of safety by pretending that it does.

Andrew.

There are several other posts in this series. Please read the disclaimer before you write an angry reply.

11 Comments :, more...

What’s wrong with Hibernate, #1

by Andrew on Dec.10, 2008, under Rants

The first in a series of missives on the subject of Hibernate.

Disclaimer: I’m not out to bash Hibernate, nor do I fundamentally dislike it. In fact I think it’s an impressive piece of work, which is why I’m bothering to highlight some of its pitfalls, in the hope that this saves other people the hassles I went through, and perhaps gives its maintainers something to think about. If I didn’t think it was worth investing the time to do this, I’d just drop it and move on.

When there’s a problem with an HQL query, all the exception says is:

“Errors in named queries:” and then the query name.

Gee, thanks. Even ‘1970s technology’ RDBMSs give more specific error reports than that. By quite a long way.

Actually, you only get this error if you’re lucky — if you’re unlucky you get something even less helpful.

Oh, and reason #1a:

An error in a query will often cause all your tests to fail. Not just ones that use that query.

Andrew.

10 Comments :, more...

Search

Use the form below to search the site:

Leave a comment if you can't find what you need.