Thanks to your kind assistance I could get a working statistics tool :)
But when I apply the script listed below to another file, I get the following error which really puzzles me:
Use of uninitialized value in concatenation (.) or string at whitespace-stat.pl
line 47, <$in> line 1 (#1)
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.
To help you figure out what was undefined, perl will try to tell you
the name of the variable (if any) that was undefined. In some cases
it cannot do this, so it also tells you what operation you used the
undefined value in. Note, however, that perl optimizes your program
and the operation displayed in the warning may not necessarily appear
literally in your program. For example, "that $foo" is usually
optimized into "that " . $foo, and the warning will refer to the
concatenation (.) operator, even though there is no . in
your program.
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
#my personal data left out!
print "Generate statistics: Whitespace in context\n";
my $infile = $ARGV[0];
#define regexes as search target (in the array @regexes)
my @regexes = (qr/§\s*[0-9]/, qr/Art\.\s*[0-9IVX]/, qr/Artikel\s*
+[0-9IVX]/, qr/Artikels\s*[0-9IVX]/, qr/Artikeln\s*[0-9IVX]/);
open my $in, '<', $infile or die "Cannot open $infile for reading: $!"
+;
#read input file in variable $xml
my $xml;
{
local $/ = undef;
$xml = <$in>;
}
#define array for frequency values
my @tally;
#count routine for each regex
for my $i (0 .. $#regexes) {
my $regex = $regexes[$i];
++$tally[$i] while $xml =~ /$regex/g;
}
#define output file
open my $out, '>', 'stats.txt' or die $!;
#output statistics
print {$out} "Statistics: Whitespace in context\n\ninput file: ";
print {$out} "$infile";
print {$out} "\n======================================================
+==================\n\n";
for my $i (0 .. $#regexes) {
my $regex = $regexes[$i];
$regex =~ s/^\(\?\^://;
$regex =~ s/\)$//;
print {$out} "$regex:\t\t$tally[$i]\n";
}
close $in;
close $out;