Big Data

Nice to see bioinformatics and computational infrastructure getting some love in today’s issue of Nature:

Above all, data on today’s scales require scientific and computational intelligence. Google may now have its critics, but no one can deny its impact, which ultimately stems from the cleverness of its informatics. The future of science depends in part on such cleverness again being applied to data for their own sake, complementing scientific hypotheses as a basis for exploring today’s information cornucopia.

Hopefully, funding agencies and universities will take note and begin funding infrastructure projects, and the scientific community will begin recognizing the value they add. A good computational project can enable thousands of discoveries, and the biological community needs to give appropriate credit (and pay) to bioinformaticians.

There are several other good articles in this issue, including one about biocuration. Link (free access for two weeks, as I understand it)

Bioinformatics Salaries

Very quick and dirty look at bioinformatics salaries in industry vs. academia:

* Academic Salary Mean/Median: $36,520 / $33,712
* Industry Salary Mean/Median: $66,239 / $64,235

Salary/Years Experience:

* Academic Mean/Median: $10,970 / $8,333
* Industry Mean/Median: $17,410 / $12,000

And they wonder why PhDs are leaving academia…

All data via the Bioinformatics Career Survey. More analyses posted at the wiki page

The Saunders principle

The Saunders principle reads thusly:

The first step in any collaboration is to reformat the data sent by your collaborators.

What’s he mean by that?

Here are the sequences that you asked for. They are in fasta format, except that I’ve marked the acetylation sites with a “*” and after that, a score in square brackets.

Gee thanks – oh, it’s a Word file too, better and better.

90% of labs I’ve come across would benefit from someone with bioinformatics skills, and maybe 10% of them have that role filled. I see labs with the results of 1000 experiments, each saved in a separate excel file, with different column orders in each file. Then they want to mine the data and can’t figure out why it should be so hard. I see labs who are still running hundreds of blast searches by hand, one at a time, through a GUI web interface. I see labs with 5 years worth of data stored on an ancient computer with NO BACKUPS.

Do you see why I cringe?

Until biologists get serious about their data, and become willing to devote resources to IT infrastructure and hiring knowledgable bioinformaticists, the field is going to continue to be a giant mess. You can only ignore things like good software design and standard file formats for so long before these things begin to hinder your ability to produce quality research.

|