|Perl: the Markov chain saw|
Re: perl's long term place in bioinformatics?by MadraghRua (Vicar)
|on Jan 19, 2010 at 20:49 UTC||Need Help??|
I've been working in bioinformatics for quite a while. My work goes between scripting solutions and desktop/server based commercial softwares. Perl in my hands is good at providing fast analyses and I've been using BioPerl in one form or another since the 90s. With more care you can use of develop codebases like BioPerl/BioRuby/BioJavas and so on and develop really useful application suites. This is a really cool, fast and practical way to learn how to program, so I would encourage you to pursue them.
The biggest issues I see in use biologists learning how to program is typically around understanding how to work with large data sets, eg. say both strands of a bacterial genome or greater. So biologists in general seem to have problems understanding how to deal with memory handling requirements for large data sets. We also don't always have a good understanding of what makes a good computational algorithm and how to take advantage of work that has already been done by the computational community - an example of this is the progress in next gen sequence analysis, where better algorithms are constructed through understanding things like suffix trees or Burrows Wheeler algorithms, how to optimize memory constraints, how to make really good indices for target genomes to map against. This is really C/C++ work. So at some point you either need to pick this up or start working with someone who can help you understand these types of approaches.
Another big issue is around working with statistical analysis of large data sets. Yes we can work with R from Perl but I find that folk struggle with knowing what is the best analysis to use, what types of statistical approaches are best for a given data set and how to positively identify what is significant and what is not. I think the best thing here is read a lot, learn to question what was done in a published study and whether it really was the most appropriate way to understand the results from a given study
A final issue concerns the display of data so that it can tell a story, whether by showing things aligned against each other or by drawing Venn diagrams or whatever. There is yet another mind set that you have to develop for graphic event driven programming that is different still from how you would tackle things in a typical script. If you go down this direction you'll need another set of mental tools to deal with this type of programming.
Ultimately as biologists we're using computers to help us tell stories based upon the data from our experiments. I find that being rooted both in the bench side of things and the computational side of things produces bioinformaticists that can produce better, more balanced stories. So make sure you keep doing experiments!. As you progress in learning how to program keep an open mind, find good tutors that will help you, find programs you like and understand how they work. Languages are simply tools and you use the tool you need for the job, you don't try to fit every job to that tool.
Hope that helps