Hi MadraghRua,
Thank you very much for the comments. I did not know about the Math::NumberCruncher module. I will see how I can use it.
About the part where you said
*“I see that you have gone with a distance measurement. When doing this a while back, we included correlations as well. So you could look at the data either as a function of a measurement (distance) or a statistical (correlation coeficient for parametric and Spearman Rank Correlation for non parpametric analysis). Perhaps you could think about allowing the algorithm to use either metric or statistical methods to increase the module's utility at some point.”*
Could you please give me some references to your work? So, I could have a look at it and learn from it. I am new to Perl (only three months into this beautiful language) but I am really eager to learn as fast as I can. Specially, I am interested in pointers to the use of Perl in Scientific applications (sort of BioPerl but in other areas too)
Cheers!
lin0
| [reply] |

Unfortunately I don't have a refernece to our program - it was an expression analysis software called Xpression by a company called InforMax - now a part of Invitrogen Corp. We built a few software systems to handle biological data for arrays, sequence analysis and pathway analysis. If you're a student or in an academic or govenment lab you can get Vector NTI for free and it has an API that you might find interesting to play with. (Shameless product placement :-). We no longer distribute Xpression but we do use it for internal analysis and development.
So the usual trend in expression analysis softwares is to normalize the data, filter and sort the data based on various criteria and finally analyse these by a variety of techniques, including clustering algorithms, neural networks, population based statistical approaches and so on. What we were doing was very similar to Rosetta or any of the other commercial softwares that still exist. Check them out. You should also have a gander at BioConductor - its a site of alorithms for analyzing different types of biological data and they are all written in R. You might find it amusing to learn that and then port the algorithm to Perl. Or simply learn to send data to a BioConductor app and get it back.
If you're interested in the references for expression analysis, try out Microarray Bioinformatics by Dov Stekel and follow the references in there. Or for a more mathematical approach, try Giovanni Parmigiani et al in The analysis of gene expression data: methods and software by Springer. I would also have a look through the Quantiative Applications in the Social Sciences series by Sage Publications. They have a nice way of taking a mathematical or statistical approach and framing it simply for us biology types and they have a nice little pamphlet on clustering.
For scientific applications in Perl, I would try out Mastering Algorithms with Perl by Orwant et al from O'Reilly or Advanced Perl Programming (first edition not second) by Srinivasan also in O'Reilly. Both those will teah you good programming practices - for instance you could gain efficiencies in your code by passing references to your arrays and dereferencing them elsewhere, rather than swapping your arrays back and forth. Something to think about in future.
Its also good to look at Pavel Pevzner's books - he's a very well respected bioinformaticist and his books explain the process of algorithm development at a very accessible level. I usually buy one of his books for each of my employees at Christmas when he puts out a new one.
MadraghRua yet another biologist hacking perl....
| [reply] |

| [reply] |

Comment onRe: RFC: Fuzzy Clustering with Perl