Re^3: how to use Algorithm::NaiveBayes module

Say you have three files: positive, negative, and the sentences to test. They are already prepared and are in this format:

wordA wordB wordC
wordD wordA wordE wordF
[download]

To train you would feed the first two files in:

my $pos_file = '/path/to/positive.txt';
my $neg_file = '/path/to/negative.txt';

my $categorizer = Algorithm::NaiveBayes->new;
my $fh;

open($fh,"<",$pos_file) or die "Could not open $pos_file: $!";
while (my $sentence = <$fh>) {
    chomp $sentence;
    my @words = split(' ',$sentence);
    my %positive;
    $positive{$_}++ for @words;
    $categorizer->add_instance(
        attributes => \%positive,
        label => 'positive');
}
close($fh);

open($fh,"<",$neg_file) or die "Could not open $neg_file: $!";
while (my $sentence = <$fh>) {
    chomp $sentence;
    my @words = split(' ',$sentence);
    my %negative;
    $negative{$_}++ for @words;
    $categorizer->add_instance(
        attributes => \%negative,
        label => 'negative');
}
close($fh);

$categorizer->train;
[download]

You can then feed the third file in:

my $sentence_file = '/path/to/sentence.txt';

open($fh,"<",$sentence_file) or die "Could not open $sentence_file: $!
+";
while (my $sentence = <$fh>) {
    chomp $sentence;
    my @words = split(' ',$sentence);
    my %test;
    $test{$_}++ for @words;
    my $probability = $categorizer->predict(attributes => \%test);
    # ...
    # do what you need with $probability
}
close($fh);
[download]

Comment on Re^3: how to use Algorithm::NaiveBayes module Select or Download Code

Replies are listed 'Best First'.
Re^4: how to use Algorithm::NaiveBayes module by agnes (Novice) on Apr 24, 2014 at 14:28 UTC
I got it, thank you for the detail response. But here is one problem, I have divided these sentence into different category, such as: revenue, cost, profit and so on, because the same word will have different tone in different environment, for example, the word increase. If it appears in the sentence about revenue, it is positive. However, if it appears in the sentence about cost, it is negative. So how can I make some modification in the code you just provided to implement this function? Thanks again!!!!	[reply]
Re^4: how to use Algorithm::NaiveBayes module by agnes (Novice) on Apr 24, 2014 at 18:41 UTC
hi~here is my code, which I ignore the category I mentioned above(revenue, cost..) #!/usr/bin/perl use warnings; use Algorithm::NaiveBayes; my $pos_file = '/Users/Agnes/Documents/positive.TXT'; my $neg_file = '/Users/Agnes/Documents/negative.txt'; my $neu_file = '/Users/Agnes/Documents/neutral.txt'; my $categorizer = Algorithm::NaiveBayes->new; my $fh; open($fh,"<",$pos_file) or die "Could not open $pos_file: $!"; while (my $sentence = <$fh>) { chomp $sentence; my @words = split(' ',$sentence); my %positive; $positive{$_}++ for @words; $categorizer->add_instance( attributes => \%positive, label => 'positive'); } close($fh); open($fh,"<",$neg_file) or die "Could not open $neg_file: $!"; while (my $sentence = <$fh>) { chomp $sentence; my @words = split(' ',$sentence); my %negative; $negative{$_}++ for @words; $categorizer->add_instance( attributes => \%negative, label => 'negative'); } close($fh); open($fh,"<",$neu_file) or die "Could not open $neg_file: $!"; while (my $sentence = <$fh>) { chomp $sentence; my @words = split(' ',$sentence); my %neutral; $neutral{$_}++ for @words; $categorizer->add_instance( attributes => \%neutral, label => 'neutral'); } close($fh); $categorizer->train; my $sentence_file = '/Users/Agnes/Documents/process_sentence.txt'; open($fh,"<",$sentence_file) or die "Could not open $sentence_file: $! +"; while (my $sentence = <$fh>) { chomp $sentence; my @words = split(' ',$sentence); my %test; $test{$_}++ for @words; my $probability = $categorizer->predict(attributes => \%test); if ( $probs->{positive} > 0.33 ) { print "%positive\n"; } if ( $probs->{negative} > 0.33 ) { print "%negative\n"; } if ( $probs->{neutral} > 0.33 ) { print "%neutral\n"; } } close($fh); [download] my positive.txt is like this: we believ exist cash cash equiv short-term investments, togeth fund generat operations, suffici meet oper requirements, regular quarter dividends, debt. expen will reduc cut travel expenditures, reduc spend vendor cont staff, reduc market spending, scale back capit. revenu relat window vista no subject similar deferr no signif undeliv elements. but when I run this program, the mistakes shows Use of uninitialized value in numeric gt (>) at calculation.pl line 60, <$fh> line 1. Use of uninitialized value in numeric gt (>) at calculation.pl line 63, <$fh> line 1. Use of uninitialized value in numeric gt (>) at calculation.pl line 66, <$fh> line 1.	[reply] [d/l]
Re^5: how to use Algorithm::NaiveBayes module by tangent (Parson) on Apr 24, 2014 at 19:45 UTC
$probs->{positive} should be $probability->{positive} Also, if you have empty lines in your files then add `next unless $sentence;` after each chomp, and you need to remove the commas from each sentence too.	[reply] [d/l]
Re^6: how to use Algorithm::NaiveBayes module by agnes (Novice) on Apr 25, 2014 at 04:56 UTC
Hi~ Thank you so much for your patience. I made some modification to the code according to your suggestion and it worked out!! However, the result is the tone of the whole text I test on, but I want to get the tone of each sentence in the text, what modification do I need to add on the code? Thank you again for your time!!	[reply]
Re^7: how to use Algorithm::NaiveBayes module by tangent (Parson) on Apr 25, 2014 at 18:05 UTC
Re^8: how to use Algorithm::NaiveBayes module by agnes (Novice) on Apr 26, 2014 at 01:13 UTC
Some notes below your chosen depth have not been shown here


"be consistent"
	PerlMonks