Hello,
I would like to count the frequency of certain keywords in the text file, which is sample.txt.
For example, I determine a main word as "Steve Jobs" and "Executive," and I would like to count the frequency of "stock option" and "package" within 10 words from "Steve Jobs" and "Executive" for the sample text below. The result that I expected is 4.
Sample text)
Stock option is the most popular compensation policy in the world these days. Steve Jobs also received huge amount of stock options, and the stock option was exercised before the fiscal year.
Different from his compensation package, the other executives received less amount of stock options.
To get the result, I used the code below and used the command that "perl code.pl sample.txt "Steve Jobs" "Executive" 10 "stock option" "package"
However, the error message occurs. The error message is "Use of uninitialized value $distance in numeric le <<=> at line..."
Could you please give me some advice to get the result I want? I am attaching the sample text and the code that I used. In the sample text, there are three different articles and it is divided by "Document ". So, I expect to get the results for the three articles. I am looking forward to your responses. I hope you all have a great weekend! I really appreciate it in advance.
PERL code)
use strict;
use warnings;
my ($filename, @mainword, $distance, @search) = @ARGV;
my $content;
open my $fh, '<', $filename or die $!;
local $/ = undef;
$content = <$fh>;
close $fh;
my @docs = split 'Document ', $content;
foreach my $doc ( @docs ) {
my $count = 0;
my $mainword = '(' . (join '|', map { "\Q$_\E" } @mainword) . ')';
my $search = '(' . (join '|', map { "\Q$_\E" } @search) . ')';
for (my $dist = 0; $dist <= $distance; $dist++) {
while ( $doc =~ /
(?:^|\W)
$search
(?=
(?:\W++\w++){$dist}
\W++\Q$mainword\E
)
/ixsg
)
{
print " found [$1] at ", $-[1], "\n";
$count++;
}
while ( $doc =~ /
(?:^|\W)
\Q$mainword\E
(?=
(?:\W++\w++){$dist}
\W++$search
)
/ixsg
)
{
print "-found [$1] at ", $-[1], "\n";
$count++;
}
}
print "match: $count\n";
}