Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re^3: RFC: Fuzzy Clustering with Perl

by MadraghRua (Vicar)
on Nov 10, 2006 at 21:06 UTC ( #583411=note: print w/replies, xml ) Need Help??

in reply to Re^2: RFC: Fuzzy Clustering with Perl
in thread RFC: Fuzzy Clustering with Perl

Unfortunately I don't have a refernece to our program - it was an expression analysis software called Xpression by a company called InforMax - now a part of Invitrogen Corp. We built a few software systems to handle biological data for arrays, sequence analysis and pathway analysis. If you're a student or in an academic or govenment lab you can get Vector NTI for free and it has an API that you might find interesting to play with. (Shameless product placement :-). We no longer distribute Xpression but we do use it for internal analysis and development.

So the usual trend in expression analysis softwares is to normalize the data, filter and sort the data based on various criteria and finally analyse these by a variety of techniques, including clustering algorithms, neural networks, population based statistical approaches and so on. What we were doing was very similar to Rosetta or any of the other commercial softwares that still exist. Check them out. You should also have a gander at BioConductor - its a site of alorithms for analyzing different types of biological data and they are all written in R. You might find it amusing to learn that and then port the algorithm to Perl. Or simply learn to send data to a BioConductor app and get it back.

If you're interested in the references for expression analysis, try out Microarray Bioinformatics by Dov Stekel and follow the references in there. Or for a more mathematical approach, try Giovanni Parmigiani et al in The analysis of gene expression data: methods and software by Springer. I would also have a look through the Quantiative Applications in the Social Sciences series by Sage Publications. They have a nice way of taking a mathematical or statistical approach and framing it simply for us biology types and they have a nice little pamphlet on clustering.

For scientific applications in Perl, I would try out Mastering Algorithms with Perl by Orwant et al from O'Reilly or Advanced Perl Programming (first edition not second) by Srinivasan also in O'Reilly. Both those will teah you good programming practices - for instance you could gain efficiencies in your code by passing references to your arrays and dereferencing them elsewhere, rather than swapping your arrays back and forth. Something to think about in future.

Its also good to look at Pavel Pevzner's books - he's a very well respected bioinformaticist and his books explain the process of algorithm development at a very accessible level. I usually buy one of his books for each of my employees at Christmas when he puts out a new one.

yet another biologist hacking perl....

Replies are listed 'Best First'.
Re^4: RFC: Fuzzy Clustering with Perl
by lin0 (Curate) on Nov 11, 2006 at 02:14 UTC

    Hi MadraghRua,

    Thank you very much for all the references. The only one that I already have is the "Mastering Algorithms with Perl". I will start reviewing all the other references in the near future.

    Thanks again


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://583411]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2016-10-23 19:35 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (302 votes). Check out past polls.