I couldn't find a way to count the tags just using the Tagger module, but as the tagged text is in an XML style you could use an XML parser - this example uses
XML::LibXML. I initially tried to use
Text::Balanced but had to give up as I couldn't work out how to extract multiple variable tags. You will need a full list of the tags which you can build from the Tagger
README document.
use strict;
use warnings;
use Data::Dumper;
use XML::LibXML;
use Lingua::EN::Tagger qw(add_tags);
my $text = <<'EOT';
The set of POS tags used here is a modified version of the
Penn Treebank tagset. Tags with non-letter characters have been
redefined to work better in our data structures.
EOT
my $tagger = Lingua::EN::Tagger->new;
my $tagged = $tagger->add_tags($text);
my $tree = XML::LibXML->load_xml(string => "<doc>$tagged</doc>");
my @tags = qw(CC DET NN NNP RB VBZ); # add the rest
my %count;
for my $tag (@tags) {
my $lctag = lc($tag); # lowercase tag name
my @nodes = $tree->findnodes("//$lctag");
$count{$tag} = scalar @nodes;
}
print Dumper(\%count);
for my $tag (sort keys %count) {
print "$tag:\t$count{$tag}\n";
}
Output:
$VAR1 = {
'CC' => 0,
'VBZ' => 1,
'RB' => 2,
'NNP' => 3,
'NN' => 4,
'DET' => 3
};
CC: 0
DET: 3
NN: 4
NNP: 3
RB: 2
VBZ: 1