http://www.perlmonks.org?node_id=818292


in reply to perl's long term place in bioinformatics?

Hi tritan

I've been working in bioinformatics for quite a while. My work goes between scripting solutions and desktop/server based commercial softwares. Perl in my hands is good at providing fast analyses and I've been using BioPerl in one form or another since the 90s. With more care you can use of develop codebases like BioPerl/BioRuby/BioJavas and so on and develop really useful application suites. This is a really cool, fast and practical way to learn how to program, so I would encourage you to pursue them.

The biggest issues I see in use biologists learning how to program is typically around understanding how to work with large data sets, eg. say both strands of a bacterial genome or greater. So biologists in general seem to have problems understanding how to deal with memory handling requirements for large data sets. We also don't always have a good understanding of what makes a good computational algorithm and how to take advantage of work that has already been done by the computational community - an example of this is the progress in next gen sequence analysis, where better algorithms are constructed through understanding things like suffix trees or Burrows Wheeler algorithms, how to optimize memory constraints, how to make really good indices for target genomes to map against. This is really C/C++ work. So at some point you either need to pick this up or start working with someone who can help you understand these types of approaches.

Another big issue is around working with statistical analysis of large data sets. Yes we can work with R from Perl but I find that folk struggle with knowing what is the best analysis to use, what types of statistical approaches are best for a given data set and how to positively identify what is significant and what is not. I think the best thing here is read a lot, learn to question what was done in a published study and whether it really was the most appropriate way to understand the results from a given study

A final issue concerns the display of data so that it can tell a story, whether by showing things aligned against each other or by drawing Venn diagrams or whatever. There is yet another mind set that you have to develop for graphic event driven programming that is different still from how you would tackle things in a typical script. If you go down this direction you'll need another set of mental tools to deal with this type of programming.

Ultimately as biologists we're using computers to help us tell stories based upon the data from our experiments. I find that being rooted both in the bench side of things and the computational side of things produces bioinformaticists that can produce better, more balanced stories. So make sure you keep doing experiments!. As you progress in learning how to program keep an open mind, find good tutors that will help you, find programs you like and understand how they work. Languages are simply tools and you use the tool you need for the job, you don't try to fit every job to that tool.

Hope that helps

MadraghRua
yet another biologist hacking perl....

  • Comment on Re: perl's long term place in bioinformatics?

Replies are listed 'Best First'.
Re^2: perl's long term place in bioinformatics?
by Anonymous Monk on Mar 15, 2010 at 09:20 UTC
    Hi Madragh Rua, Disagree with you. "Use right tool for right job" is a good phrase in theory but not so practical. How many languages can u master properly? Max 2 or 3. If u know more than that then u must be genius or u can't judge ur mental capacity. While with any one language u can touch the altar, sometimes a second language is needed to fill in deficiencies of first language u learn. That's it. Don't start telling people to learn more and more things. Then they can't write efficient code and also can't master different application areas (eg: DBMS, Networking etc. all). Hi Tritan, This is for u. Believe my words and u will be successful. Two languages to learn for bioinformatics. 1) Perl
    2) C
    We can
    a) exploit R with perl
    b) develope graphics with perl (check openGL + perl combo)
    c) do parallel processing
    d) develop web apps (using catalyst framework,modperl combo)
    e) do anyting u can imagine.

    Use "C" in-between for efficiency. Over a period of time, we will have much more faster interpreter in perl and also an unparalleled amount of free libraries added to CPAN.
    Ignore others comments on perl as write only language. Just write clean perl code following some good rules.
    That's it. I already used perl for heavy graphics, parallel processing, microarray analysis etc. so my experience is first hand.
    Java is good too but I don't like weight. Choose Moose for OOP in perl.
    Cheers, ur man
Re^2: perl's long term place in bioinformatics?
by Anonymous Monk on Mar 15, 2010 at 09:26 UTC
    Hi all,
    sorry anonymous monk again here. In my previous post, I forgot to put a break before I was addressing Mr.Tritan. So readers please find that I also addressed Mr.Tritan along with MadraghRua.
    Cheers,
    urman (Your man)