Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: segmentation and grouping

by vkkan (Initiate)
on Dec 18, 2012 at 07:56 UTC ( #1009302=note: print w/ replies, xml ) Need Help??


in reply to Re: segmentation and grouping
in thread segmentation and grouping

Given below are the sample data set
custid Name service Price Posted Date
31 John Consultation Charges 100 4/1/2012 10:39
805 Kennedy Consultation Charges 150 4/1/2012 11:17
805 Kennedy C-Reactive Protein 170 4/1/2012 11:56
805 Kennedy Complete Blood Count 150 4/1/2012 11:56
805 Kennedy Malarial 175 4/1/2012 11:56
805 Kennedy Mantoux Test 100 4/1/2012 11:56
805 Kennedy AZIBACT 1 MG SYP 28 4/1/2012 13:27
805 Kennedy FALCINILLE DRY SYP 105.15 4/1/2012 13:27
891 Ruth Consultation Charges 150 4/1/2012 12:05
891 Ruth C-Reactive Protein 170 4/1/2012 12:47
891 Ruth Complete Blood Count 150 4/1/2012 12:47
891 Ruth Mantoux Test 100 4/1/2012 12:47
891 Ruth X-Ray Chest 150 4/1/2012 12:47


service field not filled with group name , its just service they rendered so from above sample all three peoples can be go to consultation group, Kennedy can go to malarial etc . Hope I have provided needed information. Thanks your time CountZero


Comment on Re^2: segmentation and grouping
Re^3: segmentation and grouping
by CountZero (Bishop) on Dec 18, 2012 at 16:53 UTC
    I see. Your file is just a list of services rendered and you must "cluster" these into different categories. It is possible to do so, but it will take some work.

    Do you have some kind of "dictionary" which tells you into which category or categories each type of service belongs? If so, then you just have to read each service and check it against the dictionary to find out into which category or categories each service belongs. Once you have done that, you check the number and type of categories for each client and put that info in some kind of "scoring" formula to find the most appropriate category.

    If you do not have a "services-to-categories" dictionary then things become much more difficult and I really do not have a good and simple solution. I once applied Bayesian statistics on a similar problem (though only with a few broad categories to put the records in) and it worked "somewhat". I got about 80% correct categorizations (and thus 20% totally wrong), but it was enough for my purpose. If I trained the algorithm a bit more I might have gotten better results. Modules such as Algorithm::NaiveBayes or AI::Categorizer::Learner::NaiveBayes are worth taking a look at.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re^3: segmentation and grouping
by BrowserUk (Pope) on Dec 18, 2012 at 17:13 UTC

    May be I'm misunderstanding you, but is this what you want?:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my %categs; push @{ $categs{ $_->[2] } }, $_->[1] while @{ $_ = [ split ' ', <DATA +> ] }; pp \%categs; __DATA__ 31 John Consultation Charges 100 4/1/2012 10:39 805 Kennedy Consultation Charges 150 4/1/2012 11:17 805 Kennedy C-Reactive Protein 170 4/1/2012 11:56 805 Kennedy Complete Blood Count 150 4/1/2012 11:56 805 Kennedy Malarial 175 4/1/2012 11:56 805 Kennedy Mantoux Test 100 4/1/2012 11:56 805 Kennedy AZIBACT 1 MG SYP 28 4/1/2012 13:27 805 Kennedy FALCINILLE DRY SYP 105.15 4/1/2012 13:27 891 Ruth Consultation Charges 150 4/1/2012 12:05 891 Ruth C-Reactive Protein 170 4/1/2012 12:47 891 Ruth Complete Blood Count 150 4/1/2012 12:47 891 Ruth Mantoux Test 100 4/1/2012 12:47 891 Ruth X-Ray Chest 150 4/1/2012 12:47

    Producing:

    C:\test>junk59 { AZIBACT => ["Kennedy"], "C-Reactive" => ["Kennedy", "Ruth"], Complete => ["Kennedy", "Ruth"], Consultation => ["John", "Kennedy", "Ruth"], FALCINILLE => ["Kennedy"], Malarial => ["Kennedy"], Mantoux => ["Kennedy", "Ruth"], "X-Ray" => ["Ruth"], }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1009302]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2014-08-23 01:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (169 votes), past polls