Beefy Boxes and Bandwidth Generously Provided by pair Networks chromatic writing perl on a camel
laziness, impatience, and hubris
 
PerlMonks  

Re: RFC: Fuzzy Clustering with Perl

by MadraghRua (Vicar)
on Nov 07, 2006 at 19:47 UTC ( #582722=note: print w/ replies, xml ) Need Help??


in reply to RFC: Fuzzy Clustering with Perl

I see that you have gone with a distance measurement. When doing this a while back, we included correlations as well. So you could look at the data either as a function of a measurement (distance) or a statistical (correlation coeficient for parametric and Spearman Rank Correlation for non parpametric analysis). Perhaps you could think about allowing the algorithm to use either metric or statistical methods to increase the module's utility at some point.

If you look on CPAN you'll find Statistics::RankCorrelation. If you would like to calculate your distances outside of your module, there is Math-NumberCruncher. Mind you I could find nothing on Tschebyschev in CPAN though.

MadraghRua
yet another biologist hacking perl....


Comment on Re: RFC: Fuzzy Clustering with Perl
Re^2: RFC: Fuzzy Clustering with Perl
by lin0 (Curate) on Nov 08, 2006 at 13:56 UTC

    Hi MadraghRua,

    Thank you very much for the comments. I did not know about the Math::NumberCruncher module. I will see how I can use it.

    About the part where you said

    I see that you have gone with a distance measurement. When doing this a while back, we included correlations as well. So you could look at the data either as a function of a measurement (distance) or a statistical (correlation coeficient for parametric and Spearman Rank Correlation for non parpametric analysis). Perhaps you could think about allowing the algorithm to use either metric or statistical methods to increase the module's utility at some point.

    Could you please give me some references to your work? So, I could have a look at it and learn from it. I am new to Perl (only three months into this beautiful language) but I am really eager to learn as fast as I can. Specially, I am interested in pointers to the use of Perl in Scientific applications (sort of BioPerl but in other areas too)

    Cheers!

    lin0
      Unfortunately I don't have a refernece to our program - it was an expression analysis software called Xpression by a company called InforMax - now a part of Invitrogen Corp. We built a few software systems to handle biological data for arrays, sequence analysis and pathway analysis. If you're a student or in an academic or govenment lab you can get Vector NTI for free and it has an API that you might find interesting to play with. (Shameless product placement :-). We no longer distribute Xpression but we do use it for internal analysis and development.

      So the usual trend in expression analysis softwares is to normalize the data, filter and sort the data based on various criteria and finally analyse these by a variety of techniques, including clustering algorithms, neural networks, population based statistical approaches and so on. What we were doing was very similar to Rosetta or any of the other commercial softwares that still exist. Check them out. You should also have a gander at BioConductor - its a site of alorithms for analyzing different types of biological data and they are all written in R. You might find it amusing to learn that and then port the algorithm to Perl. Or simply learn to send data to a BioConductor app and get it back.

      If you're interested in the references for expression analysis, try out Microarray Bioinformatics by Dov Stekel and follow the references in there. Or for a more mathematical approach, try Giovanni Parmigiani et al in The analysis of gene expression data: methods and software by Springer. I would also have a look through the Quantiative Applications in the Social Sciences series by Sage Publications. They have a nice way of taking a mathematical or statistical approach and framing it simply for us biology types and they have a nice little pamphlet on clustering.

      For scientific applications in Perl, I would try out Mastering Algorithms with Perl by Orwant et al from O'Reilly or Advanced Perl Programming (first edition not second) by Srinivasan also in O'Reilly. Both those will teah you good programming practices - for instance you could gain efficiencies in your code by passing references to your arrays and dereferencing them elsewhere, rather than swapping your arrays back and forth. Something to think about in future.

      Its also good to look at Pavel Pevzner's books - he's a very well respected bioinformaticist and his books explain the process of algorithm development at a very accessible level. I usually buy one of his books for each of my employees at Christmas when he puts out a new one.

      MadraghRua
      yet another biologist hacking perl....

        Hi MadraghRua,

        Thank you very much for all the references. The only one that I already have is the "Mastering Algorithms with Perl". I will start reviewing all the other references in the near future.

        Thanks again

        lin0

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://582722]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (8)
As of 2014-04-20 04:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls