segmentation and grouping

vkkan has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: segmentation and grouping by CountZero (Bishop) on Dec 18, 2012 at 07:38 UTC
I think you are on the wrong track. I doubt it that "cluster analysis" will help you. Do you still have to analyse the type of services rendered to the client and decide in which group they belong? Or is the "service" field already filled in with the group they belong to? If that is the case then a simple SQL query will be enough. Perhaps you can show a few sample lines of your data so we better understand what you want to do and what data is available. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply]
Re^2: segmentation and grouping by vkkan (Initiate) on Dec 18, 2012 at 07:56 UTC
Given below are the sample data set custid Name service Price Posted Date 31 John Consultation Charges 100 4/1/2012 10:39 805 Kennedy Consultation Charges 150 4/1/2012 11:17 805 Kennedy C-Reactive Protein 170 4/1/2012 11:56 805 Kennedy Complete Blood Count 150 4/1/2012 11:56 805 Kennedy Malarial 175 4/1/2012 11:56 805 Kennedy Mantoux Test 100 4/1/2012 11:56 805 Kennedy AZIBACT 1 MG SYP 28 4/1/2012 13:27 805 Kennedy FALCINILLE DRY SYP 105.15 4/1/2012 13:27 891 Ruth Consultation Charges 150 4/1/2012 12:05 891 Ruth C-Reactive Protein 170 4/1/2012 12:47 891 Ruth Complete Blood Count 150 4/1/2012 12:47 891 Ruth Mantoux Test 100 4/1/2012 12:47 891 Ruth X-Ray Chest 150 4/1/2012 12:47 service field not filled with group name , its just service they rendered so from above sample all three peoples can be go to consultation group, Kennedy can go to malarial etc . Hope I have provided needed information. Thanks your time CountZero	[reply]
Re^3: segmentation and grouping by BrowserUk (Patriarch) on Dec 18, 2012 at 17:13 UTC
May be I'm misunderstanding you, but is this what you want?: #! perl -slw use strict; use Data::Dump qw[ pp ]; my %categs; push @{ $categs{ $_->[2] } }, $_->[1] while @{ $_ = [ split ' ', <DATA +> ] }; pp \%categs; __DATA__ 31 John Consultation Charges 100 4/1/2012 10:39 805 Kennedy Consultation Charges 150 4/1/2012 11:17 805 Kennedy C-Reactive Protein 170 4/1/2012 11:56 805 Kennedy Complete Blood Count 150 4/1/2012 11:56 805 Kennedy Malarial 175 4/1/2012 11:56 805 Kennedy Mantoux Test 100 4/1/2012 11:56 805 Kennedy AZIBACT 1 MG SYP 28 4/1/2012 13:27 805 Kennedy FALCINILLE DRY SYP 105.15 4/1/2012 13:27 891 Ruth Consultation Charges 150 4/1/2012 12:05 891 Ruth C-Reactive Protein 170 4/1/2012 12:47 891 Ruth Complete Blood Count 150 4/1/2012 12:47 891 Ruth Mantoux Test 100 4/1/2012 12:47 891 Ruth X-Ray Chest 150 4/1/2012 12:47 [download] Producing: `C:\test>junk59 { AZIBACT => ["Kennedy"], "C-Reactive" => ["Kennedy", "Ruth"], Complete => ["Kennedy", "Ruth"], Consultation => ["John", "Kennedy", "Ruth"], FALCINILLE => ["Kennedy"], Malarial => ["Kennedy"], Mantoux => ["Kennedy", "Ruth"], "X-Ray" => ["Ruth"], }` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP Neil Armstrong	[reply] [d/l] [select]
Re^3: segmentation and grouping by CountZero (Bishop) on Dec 18, 2012 at 16:53 UTC
I see. Your file is just a list of services rendered and you must "cluster" these into different categories. It is possible to do so, but it will take some work. Do you have some kind of "dictionary" which tells you into which category or categories each type of service belongs? If so, then you just have to read each service and check it against the dictionary to find out into which category or categories each service belongs. Once you have done that, you check the number and type of categories for each client and put that info in some kind of "scoring" formula to find the most appropriate category. If you do not have a "services-to-categories" dictionary then things become much more difficult and I really do not have a good and simple solution. I once applied Bayesian statistics on a similar problem (though only with a few broad categories to put the records in) and it worked "somewhat". I got about 80% correct categorizations (and thus 20% totally wrong), but it was enough for my purpose. If I trained the algorithm a bit more I might have gotten better results. Modules such as Algorithm::NaiveBayes or AI::Categorizer::Learner::NaiveBayes are worth taking a look at. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply]


Pathologically Eclectic Rubbish Lister
	PerlMonks