Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

perl's long term place in bioinformatics?

by tritan (Sexton)
on Jan 12, 2010 at 22:26 UTC ( #817073=perlmeditation: print w/replies, xml ) Need Help??

Hi there all,

Newly initiated perlmonk, both irl and online. I've just started working a bioinformatics and researching job in Boston after graduating college. I've started to learn perl and get involved with it, but I wanted to ask some experts on what their experiences have been.

My main question is whether perl's usefulness in bioinformatics, or computational biology as a whole, has been increasing or decreasing? While perl is the language I've currently chosen to learn and try to get better at, I'd like some more information from a professional stand point on whether it was a good decision. And what better place to ask than people who really know perl : ) So? Is perl a good choice for bioinformatics, in the long run? Or will it eventually be replaced by another language, for whatever reason?

Also, are there any other languages that are good in complement to working with perl? Or is perl strong enough to stand alone as like, a primary language of choice? Specifically in reference to bioinformatics and genetics/genomics work?


  • Comment on perl's long term place in bioinformatics?

Replies are listed 'Best First'.
Re: perl's long term place in bioinformatics?
by bobf (Monsignor) on Jan 13, 2010 at 06:20 UTC

    Before I comment on which tool you should use, what are you trying to do with it?

    Bioinformatics is a huge field of study that includes an extremely wide range of topics in both scientific domains and computer science/IT. There are as many definitions for "bioinformatics" as there are bioinformaticians, and based on my experience nearly every one of them has an educational and professional background that, combined with their current work, gives them a unique view into the field.

    For example, my educational background is in biochemistry and pharmacogenomics. I started to learn to program using shell and VB scripting while in grad school. It wasn't long before I needed a more powerful tool, so on the recommendation of my colleagues I started to learn Perl. After several years in bioinformatics I transitioned to medical informatics. I have been out of the wet lab for 10 years now. Perl is the only language that I code in and I can count on one hand the number of times I needed1 to learn another language2.

    Perl has a great library of modules to support bioinformatics/genomics work, and it can be integrated with other languages such as R and C (both of which are common in bioinformatics). I found it to be very intuitive and quite powerful. If you have a limited programming background I would definitely recommend starting with Perl. The recommendation of other languages depends on what you anticipate doing (e.g., statistical analyses, heavy computation, simulations, web apps, data warehousing, system integration/interoperability, etc).

    IMO, you're off to a good start. Good luck, and have fun! :-)

    1Due to the preferences or abilities of collaborators, or to the need to work with existing code written in another language.

    2Based on conversations with those formally trained in CS, learning several different types of languages can be quite valuable. I am not discouraging this. The statement was made to lend support to the utility of Perl in this field.

      It's funny you should mention the broadness of the bioinformatics fields. That's something I've been having a problem with. It's been difficult to define what it is I'm exactly interested in, when bioinformatics as a term itself is so enigmatic. Biology, computers, maybe statistics? I wish it were a bit more rigid, just for the benefit of being more of a guidepost ; )

      You mentioned you went from bioinformatics to medical informatics. Do you see the two fields as distinct, or as a subset of one another?

        Yes, it is difficult to draw lines. That is probably as much of a reflection of the underlying science than anything, though. Biomedical research is messy. There is a lot of overlap between fields yet there is tremendous depth as well.

        I make a somewhat arbitrary distinction between "bioinformatics" and "medical informatics". I see the former as relating more directly to wet lab support: munging data files from lab equipment and public databases, performing basic statistical analyses, etc. Lower-level stuff. Medical informatics (in my mind) focuses more on translational medicine. It tends to concentrate on higher-level concepts and the relationship of them to medical knowledge. Whereas someone in bioinformatics might work on highly specialized projects that support tightly-scoped research (e.g., specific genes, pathways, regulation, biomolecules), those in medical informatics tend to look at how low-level research from many sources can be integrated and translated to something that can more directly impact the standard of medical care (e.g., systems integration, knowledge management, data mining and inference, all from the level of a gene to a multi-center clinical trial).

        Just to muddy the waters even further, I view "clinical informatics" as a field that supports and focuses on things related directly to patient care. That sub-specialty tends to be much more dependent on IT knowledge than biological knowledge (e.g., electronic health records, portals for patient access, billing, etc).

        These definitions and distinctions are arbitrary and, in many ways, artificial. Ultimately it is experience that will define your professional career and get you your next job, not the words you choose to put on your resume.

Re: perl's long term place in bioinformatics?
by BioLion (Curate) on Jan 13, 2010 at 11:49 UTC

    In addition to what other people have already said, maybe it is worth pointing out a few things. Just by way of explanation, I have worked in bioinformatics for a few years now, specifically in genomics, so my take may be different to other peoples.

  • Perl is undoubtedly the munging King (/Queen), as other people have said there are a lot of parsers already out there, and the speed of development makes rolling your own much easier for whatever extra source of data you find. For me this is a great boost, because a lot of modern day bioinformatics involves integrating, often quite disparate(sp?), datasets. As such will probably be the go-to language on a day-to-day basis.
  • Perl facilitates data fetching, for example through APIs for BioMart. Perl also can interact with a wide variety of databases ( DBI etc...) which further facilitates gaining access to and integrating data.
  • Done well, Perl is capable of some fairly heavy lifting. Here you'd have to think about the implications of development time, maintainability, running time etc... which could influence your choice of language for a given project/customer.
  • There are other technical things, but one thing i think is worth thinking about is *Jobs*... Perl is probably the most widespread in the community, and this can work both for and against you - it is sort of expected that you'll know some at least, to work with existing code, but having another string in your bow can be a boost for job hunting.
  • Finally, and this is specific to genomics I guess, and this is sort of a negative - the sheer scale of the datasets means that Perl is often not the best choice, either in terms of running time, or memory use ( I find this especially applies to BioPerl objects ). However, like Ruby, Perl objects can be very lightweight, if you don't mind writing your own. Ruby also has a growing BioRuby community, though it is not as large or well funded as the BioPerl community.

    My second caveat is graphics and GUIs - users generally want to see and interact with their data, and I personally find Java the best option here, though Python is good too.

    For a first language and for picking up the basics look no further than Perl, there have been discussions here and elsewhere before (Super Search and google will find them) about choice of a first language, so i won't go on... Then once your are more experienced and involved in bigger projects, you may well want to start thinking about Ruby or Java. But Perl is and will remain for some time, the industry standard.

    Just a something something...
      I agree with everything you say, but when I worked in bioinformatics (caveat, >5 years ago), we did a lot of graphical stuff in Perl/Tk and it was simple and worked well. Large scale things were indeed done in Java, C++ or Delphi, but for quick and dirty, Perl was (and is) King. I still find it amazing what it can do in a few lines.

      Helgi Briem
      hbriem AT f-prot DOT com
Re: perl's long term place in bioinformatics?
by eyepopslikeamosquito (Bishop) on Jan 13, 2010 at 04:14 UTC
Re: perl's long term place in bioinformatics?
by educated_foo (Vicar) on Jan 13, 2010 at 03:04 UTC
    Yes, it's a fine choice for text munging and gluing programs together. And since many bioinformatics programs are distributed as command-line executables rather than libraries for some language, a good glue language is important. I suggest also learning R for statistics, and C to make your own algorithms run fast.
      Hi there! Do you have an opinion on C over C++, or the other way around? And why do you necessarily need to know one or the other? Is algorithm efficiency, speed wise, that important? I have to admit, I don't mind a job taking an extra day to run, since it gives me time to work on other stuff! But then again, I'm probably in a more laid back situation. I'm just research tech. Not even in graduate school yet, and I can only imagine where that attitude would get you as a professor ; ) But for now, it works
        I actually prefer C++, since I know it and it's as fast as C when you use it as C, and sometimes faster when you understand its compilation model. But the extension APIs for both R and Perl (and many other languages) are written in C, not C++, and C is a much simpler language.
Re: perl's long term place in bioinformatics?
by llancet (Friar) on Jan 13, 2010 at 04:42 UTC

    I've been using perl in bioinformatics for 2 years. It's a good language for bioinformatics, for it has well supported biological suite (bioperl) and convenient text parsing. It's good at glue different biological programs together.

    Comparing with other languages, it seems that C/C++ don't have a widely accepted "standard" biological suite like bioperl, and you're not likely to cope with seg fault in daily work. So don't use C/C++ unless run-time efficiency is critical. Java is another good choice, because its object system is much stronger than perl, and have Biojava. However, when working with string, I think perl is more convenient. So if you are going to make some large program, you may consider Java.

    Python and Ruby may also be good choices, but I don't know much about those languages.
Re: perl's long term place in bioinformatics?
by zentara (Archbishop) on Jan 13, 2010 at 11:35 UTC
    Besides the genome sequencing thru text processing usefullness.... remember the easy Database access it also gives you.There is alot more than crunching numbers going on in research.... you have to be able to store/retreive and display data easily. Perl is the best comprimise I've found as a general purpose glue language.

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku
Re: perl's long term place in bioinformatics?
by andreas1234567 (Vicar) on Jan 16, 2010 at 23:10 UTC
    BioPerl was recently featured on FLOSS Weekly.
    No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]
Re: perl's long term place in bioinformatics?
by MadraghRua (Vicar) on Jan 19, 2010 at 20:49 UTC
    Hi tritan

    I've been working in bioinformatics for quite a while. My work goes between scripting solutions and desktop/server based commercial softwares. Perl in my hands is good at providing fast analyses and I've been using BioPerl in one form or another since the 90s. With more care you can use of develop codebases like BioPerl/BioRuby/BioJavas and so on and develop really useful application suites. This is a really cool, fast and practical way to learn how to program, so I would encourage you to pursue them.

    The biggest issues I see in use biologists learning how to program is typically around understanding how to work with large data sets, eg. say both strands of a bacterial genome or greater. So biologists in general seem to have problems understanding how to deal with memory handling requirements for large data sets. We also don't always have a good understanding of what makes a good computational algorithm and how to take advantage of work that has already been done by the computational community - an example of this is the progress in next gen sequence analysis, where better algorithms are constructed through understanding things like suffix trees or Burrows Wheeler algorithms, how to optimize memory constraints, how to make really good indices for target genomes to map against. This is really C/C++ work. So at some point you either need to pick this up or start working with someone who can help you understand these types of approaches.

    Another big issue is around working with statistical analysis of large data sets. Yes we can work with R from Perl but I find that folk struggle with knowing what is the best analysis to use, what types of statistical approaches are best for a given data set and how to positively identify what is significant and what is not. I think the best thing here is read a lot, learn to question what was done in a published study and whether it really was the most appropriate way to understand the results from a given study

    A final issue concerns the display of data so that it can tell a story, whether by showing things aligned against each other or by drawing Venn diagrams or whatever. There is yet another mind set that you have to develop for graphic event driven programming that is different still from how you would tackle things in a typical script. If you go down this direction you'll need another set of mental tools to deal with this type of programming.

    Ultimately as biologists we're using computers to help us tell stories based upon the data from our experiments. I find that being rooted both in the bench side of things and the computational side of things produces bioinformaticists that can produce better, more balanced stories. So make sure you keep doing experiments!. As you progress in learning how to program keep an open mind, find good tutors that will help you, find programs you like and understand how they work. Languages are simply tools and you use the tool you need for the job, you don't try to fit every job to that tool.

    Hope that helps

    yet another biologist hacking perl....

      Hi Madragh Rua, Disagree with you. "Use right tool for right job" is a good phrase in theory but not so practical. How many languages can u master properly? Max 2 or 3. If u know more than that then u must be genius or u can't judge ur mental capacity. While with any one language u can touch the altar, sometimes a second language is needed to fill in deficiencies of first language u learn. That's it. Don't start telling people to learn more and more things. Then they can't write efficient code and also can't master different application areas (eg: DBMS, Networking etc. all). Hi Tritan, This is for u. Believe my words and u will be successful. Two languages to learn for bioinformatics. 1) Perl
      2) C
      We can
      a) exploit R with perl
      b) develope graphics with perl (check openGL + perl combo)
      c) do parallel processing
      d) develop web apps (using catalyst framework,modperl combo)
      e) do anyting u can imagine.

      Use "C" in-between for efficiency. Over a period of time, we will have much more faster interpreter in perl and also an unparalleled amount of free libraries added to CPAN.
      Ignore others comments on perl as write only language. Just write clean perl code following some good rules.
      That's it. I already used perl for heavy graphics, parallel processing, microarray analysis etc. so my experience is first hand.
      Java is good too but I don't like weight. Choose Moose for OOP in perl.
      Cheers, ur man
      Hi all,
      sorry anonymous monk again here. In my previous post, I forgot to put a break before I was addressing Mr.Tritan. So readers please find that I also addressed Mr.Tritan along with MadraghRua.
      urman (Your man)

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://817073]
Approved by Old_Gray_Bear
Front-paged by Old_Gray_Bear
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2023-03-29 09:01 GMT
Find Nodes?
    Voting Booth?
    Which type of climate do you prefer to live in?

    Results (70 votes). Check out past polls.