The Saunders principle

The Saunders principle reads thusly:

The first step in any collaboration is to reformat the data sent by your collaborators.

What’s he mean by that?

Here are the sequences that you asked for. They are in fasta format, except that I’ve marked the acetylation sites with a “*” and after that, a score in square brackets.

Gee thanks - oh, it’s a Word file too, better and better.

90% of labs I’ve come across would benefit from someone with bioinformatics skills, and maybe 10% of them have that role filled. I see labs with the results of 1000 experiments, each saved in a separate excel file, with different column orders in each file. Then they want to mine the data and can’t figure out why it should be so hard. I see labs who are still running hundreds of blast searches by hand, one at a time, through a GUI web interface. I see labs with 5 years worth of data stored on an ancient computer with NO BACKUPS.

Do you see why I cringe?

Until biologists get serious about their data, and become willing to devote resources to IT infrastructure and hiring knowledgable bioinformaticists, the field is going to continue to be a giant mess. You can only ignore things like good software design and standard file formats for so long before these things begin to hinder your ability to produce quality research.

|