We don't bite newbies here... much | |
PerlMonks |
Clustering/classifying recommendationsby f77coder (Beadle) |
on Aug 19, 2014 at 16:30 UTC ( [id://1097999]=perlquestion: print w/replies, xml ) | Need Help?? |
f77coder has asked for the wisdom of the Perl Monks concerning the following question: Hello I'm interested in recommendations for clustering with attributes of being fast over lightweight/small. So I'd prefer loops over one-liners if the loop can be executed faster. Now I'm looking through the large list of various CPAN archives (AI, Bayes, Cluster, etc) and would like narrow down the search. I don't mind getting the source code and having to hack if doesn't quite match what I need to do rather than having an expectation of something work as is. The input data is a mixture of integers and strings, all categorical data. I'd like to look at each data line as an array and do vector processing, think of it as a 1d image processing problem, how many pixels are different. For example, line1=> cat1=123, cat2=92, cat3=5, cat4='0xffa411', cat5='0x221133', cat6='0xa291f1' line2=> cat1=3, cat2=92, cat3=5, cat4='0xaf1401', cat5='0xaaffcc', cat6='0xa23af1' I'd like to create a distance measurement based only on the number of categories that are different, in this case, the distance map would be (cat2,cat3,4). There will probably be a weighting function applied to this metric as well. Once the training is complete then for a new line make a prediction with the classify/cluster. Thanks
Back to
Seekers of Perl Wisdom
|
|