Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^3: how to use Algorithm::NaiveBayes module

by tangent (Parson)
on Apr 24, 2014 at 11:31 UTC ( [id://1083566]=note: print w/replies, xml ) Need Help??


in reply to Re^2: how to use Algorithm::NaiveBayes module
in thread how to use Algorithm::NaiveBayes module

Say you have three files: positive, negative, and the sentences to test. They are already prepared and are in this format:
wordA wordB wordC wordD wordA wordE wordF
To train you would feed the first two files in:
my $pos_file = '/path/to/positive.txt'; my $neg_file = '/path/to/negative.txt'; my $categorizer = Algorithm::NaiveBayes->new; my $fh; open($fh,"<",$pos_file) or die "Could not open $pos_file: $!"; while (my $sentence = <$fh>) { chomp $sentence; my @words = split(' ',$sentence); my %positive; $positive{$_}++ for @words; $categorizer->add_instance( attributes => \%positive, label => 'positive'); } close($fh); open($fh,"<",$neg_file) or die "Could not open $neg_file: $!"; while (my $sentence = <$fh>) { chomp $sentence; my @words = split(' ',$sentence); my %negative; $negative{$_}++ for @words; $categorizer->add_instance( attributes => \%negative, label => 'negative'); } close($fh); $categorizer->train;
You can then feed the third file in:
my $sentence_file = '/path/to/sentence.txt'; open($fh,"<",$sentence_file) or die "Could not open $sentence_file: $! +"; while (my $sentence = <$fh>) { chomp $sentence; my @words = split(' ',$sentence); my %test; $test{$_}++ for @words; my $probability = $categorizer->predict(attributes => \%test); # ... # do what you need with $probability } close($fh);

Replies are listed 'Best First'.
Re^4: how to use Algorithm::NaiveBayes module
by agnes (Novice) on Apr 24, 2014 at 14:28 UTC
    I got it, thank you for the detail response. But here is one problem, I have divided these sentence into different category, such as: revenue, cost, profit and so on, because the same word will have different tone in different environment, for example, the word increase. If it appears in the sentence about revenue, it is positive. However, if it appears in the sentence about cost, it is negative. So how can I make some modification in the code you just provided to implement this function? Thanks again!!!!
Re^4: how to use Algorithm::NaiveBayes module
by agnes (Novice) on Apr 24, 2014 at 18:41 UTC
    hi~here is my code, which I ignore the category I mentioned above(revenue, cost..)
    #!/usr/bin/perl use warnings; use Algorithm::NaiveBayes; my $pos_file = '/Users/Agnes/Documents/positive.TXT'; my $neg_file = '/Users/Agnes/Documents/negative.txt'; my $neu_file = '/Users/Agnes/Documents/neutral.txt'; my $categorizer = Algorithm::NaiveBayes->new; my $fh; open($fh,"<",$pos_file) or die "Could not open $pos_file: $!"; while (my $sentence = <$fh>) { chomp $sentence; my @words = split(' ',$sentence); my %positive; $positive{$_}++ for @words; $categorizer->add_instance( attributes => \%positive, label => 'positive'); } close($fh); open($fh,"<",$neg_file) or die "Could not open $neg_file: $!"; while (my $sentence = <$fh>) { chomp $sentence; my @words = split(' ',$sentence); my %negative; $negative{$_}++ for @words; $categorizer->add_instance( attributes => \%negative, label => 'negative'); } close($fh); open($fh,"<",$neu_file) or die "Could not open $neg_file: $!"; while (my $sentence = <$fh>) { chomp $sentence; my @words = split(' ',$sentence); my %neutral; $neutral{$_}++ for @words; $categorizer->add_instance( attributes => \%neutral, label => 'neutral'); } close($fh); $categorizer->train; my $sentence_file = '/Users/Agnes/Documents/process_sentence.txt'; open($fh,"<",$sentence_file) or die "Could not open $sentence_file: $! +"; while (my $sentence = <$fh>) { chomp $sentence; my @words = split(' ',$sentence); my %test; $test{$_}++ for @words; my $probability = $categorizer->predict(attributes => \%test); if ( $probs->{positive} > 0.33 ) { print "%positive\n"; } if ( $probs->{negative} > 0.33 ) { print "%negative\n"; } if ( $probs->{neutral} > 0.33 ) { print "%neutral\n"; } } close($fh);

    my positive.txt is like this:

    we believ exist cash cash equiv short-term investments, togeth fund generat operations, suffici meet oper requirements, regular quarter dividends, debt.

    expen will reduc cut travel expenditures, reduc spend vendor cont staff, reduc market spending, scale back capit.

    revenu relat window vista no subject similar deferr no signif undeliv elements.

    but when I run this program, the mistakes shows

    Use of uninitialized value in numeric gt (>) at calculation.pl line 60, <$fh> line 1.

    Use of uninitialized value in numeric gt (>) at calculation.pl line 63, <$fh> line 1.

    Use of uninitialized value in numeric gt (>) at calculation.pl line 66, <$fh> line 1.

      $probs->{positive} should be $probability->{positive}

      Also, if you have empty lines in your files then add next unless $sentence; after each chomp, and you need to remove the commas from each sentence too.

        Hi~ Thank you so much for your patience. I made some modification to the code according to your suggestion and it worked out!! However, the result is the tone of the whole text I test on, but I want to get the tone of each sentence in the text, what modification do I need to add on the code? Thank you again for your time!!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1083566]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-04-23 14:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found