Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^4: Entity statistics

by LexPl (Sexton)
on Nov 12, 2024 at 13:15 UTC ( [id://11162657]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Entity statistics
in thread Entity statistics

First of all, many thanks for the helpful assistance and good advice from @choroba and @hippo!

I have taken up your input and build the following script:

#!/usr/bin/perl use warnings; use strict; use diagnostics; my $infile = $ARGV[0]; my @regexes = (qr/&sect;\s*[0-9]/, qr/Art\.\s*[0-9IVX]/, qr/Artikel\s* +[0-9IVX]/, qr/Artikels\s*[0-9IVX]/, qr/Artikeln\s*[0-9IVX]/); open my $in, '<', $infile or die "Cannot open $infile for reading: $!" +; my $xml; { local $/ = undef; $xml = <$in>; } my $tally; for my $i (0 .. $#regexes) { my $regex = $regexes[$i]; ++$tally[$i] while $xml =~ /$regex/g; } for my $i (0 .. $#regexes) { print "$regexes[$i]:\t$tally[$i]\n"; } close $in;

With use strict; I get the following error message:

Global symbol "@tally" requires explicit package name (did you forget +to declare "my @tally"?) at monk2.pl line 24. Global symbol "@tally" requires explicit package name (did you forget +to declare "my @tally"?) at monk2.pl line 28. Execution of monk2.pl aborted due to compilation errors (#1) (F) You've said "use strict" or "use strict vars", which indicates that all variables must either be lexically scoped (using "my" or +"state"), declared beforehand using "our", or explicitly qualified to say which package the global variable is in (using "::"). Uncaught exception from user code: Global symbol "@tally" requires explicit package name (did you + forget to declare "my @tally"?) at monk2.pl line 24. Global symbol "@tally" requires explicit package name (did you + forget to declare "my @tally"?) at monk2.pl line 28. Execution of monk2.pl aborted due to compilation errors.</i>

As the variable $tally is defined beforehand and preceded by the keyword "my", I don't understand what is wrong. How could I fix this?

If I run the same script without use strict;, the output looks like this:

(?^:&sect;\s*[0-9]): 3 (?^:Art\.\s*[0-9IVX]): 2 (?^:Artikel\s*[0-9IVX]): 2 (?^:Artikels\s*[0-9IVX]): 2 (?^:Artikeln\s*[0-9IVX]): 2

How could I get rid of "(?^:" and ")"? Would it be possible to save this output to a file?

Have a nice afternoon!

Replies are listed 'Best First'.
Re^5: Entity statistics
by choroba (Cardinal) on Nov 12, 2024 at 13:23 UTC
    > As the variable $tally is defined beforehand and preceded by the keyword "my", I don't understand what is wrong. How could I fix this?

    The scalar variable $tally is different to an array variable @tally. Single members of the array are called with a dollar sign followed by a square bracket, but they are still elements of the array @tally. So, you need to declare the array:

    my @tally;

    > How could I get rid of "(?^:" and ")"?

    One possibility is to use a regex:

    for my $i (0 .. $#regexes) { my $regex = $regexes[$i]; $regex =~ s/^\(\?\^://; $regex =~ s/\)$//; print "$regex:\t$tally[$i]\n"; }

    > Would it be possible to save this output to a file?

    The easiest way is to use redirection in your shell, it should work even in MSWin.

    perl script.pl > output.txt

    If you want to write to a file from within Perl, open a file for writing and print to it:

    open my $out, '>', 'output.txt' or die $!; for my $i (0 .. $#regexes) { my $regex = $regexes[$i]; $regex =~ s/^\(\?\^://; $regex =~ s/\)$//; print {$out} "$regex:\t$tally[$i]\n"; }
    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      Thanks to your kind assistance I could get a working statistics tool :)

      But when I apply the script listed below to another file, I get the following error which really puzzles me:

      Use of uninitialized value in concatenation (.) or string at whitespace-stat.pl line 47, <$in> line 1 (#1)
      (W uninitialized) An undefined value was used as if it were already defined. It was interpreted as a "" or a 0, but maybe it was a mistake. To suppress this warning assign a defined value to your variables.

      To help you figure out what was undefined, perl will try to tell you the name of the variable (if any) that was undefined. In some cases it cannot do this, so it also tells you what operation you used the undefined value in. Note, however, that perl optimizes your program and the operation displayed in the warning may not necessarily appear literally in your program. For example, "that $foo" is usually optimized into "that " . $foo, and the warning will refer to the concatenation (.) operator, even though there is no . in your program.

      #!/usr/bin/perl use warnings; use strict; use diagnostics; #my personal data left out! print "Generate statistics: Whitespace in context\n"; my $infile = $ARGV[0]; #define regexes as search target (in the array @regexes) my @regexes = (qr/&sect;\s*[0-9]/, qr/Art\.\s*[0-9IVX]/, qr/Artikel\s* +[0-9IVX]/, qr/Artikels\s*[0-9IVX]/, qr/Artikeln\s*[0-9IVX]/); open my $in, '<', $infile or die "Cannot open $infile for reading: $!" +; #read input file in variable $xml my $xml; { local $/ = undef; $xml = <$in>; } #define array for frequency values my @tally; #count routine for each regex for my $i (0 .. $#regexes) { my $regex = $regexes[$i]; ++$tally[$i] while $xml =~ /$regex/g; } #define output file open my $out, '>', 'stats.txt' or die $!; #output statistics print {$out} "Statistics: Whitespace in context\n\ninput file: "; print {$out} "$infile"; print {$out} "\n====================================================== +==================\n\n"; for my $i (0 .. $#regexes) { my $regex = $regexes[$i]; $regex =~ s/^\(\?\^://; $regex =~ s/\)$//; print {$out} "$regex:\t\t$tally[$i]\n"; } close $in; close $out;

        > I get the following error

        It's not an error, it's a warning. You can easily tell it from the W in the diagnostics output: "(W uninitialized)".

        The most probable reason is some of the regexes didn't match anything, so their corresponding element in the array is undefined. You can print 0 instead of an undefined value using the defined-or operator //:

        print {$out} "$regex:\t\t", $tally[$i] // 0, "\n";
        or you can prepopulate the array with zeros:
        my @tally = (0) x @regexes;

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^5: Entity statistics
by hippo (Archbishop) on Nov 12, 2024 at 13:34 UTC
    As the variable $tally is defined beforehand and preceded by the keyword "my", I don't understand what is wrong. How could I fix this?

    You have declared $tally which is a scalar but the errors are telling you about @tally which is an array. Since your loops refer to the array and not the scalar, that is what you need to declare instead. See the basic datatypes, three for more about the basic data types in Perl and how the sigils relate to them.

    How could I get rid of "(?^:" and ")"?

    You could process the string which you actually output to achieve this but in this particular case you can avoid that by using quotes to delimit each regex in the first place instead of using the qr// operator. You can use single quotes 'foo' or q/foo/ for non-interpolated strings. ie:

    my @regexes = (q/&sect;\s*[0-9]/, q/Art\.\s*[0-9IVX]/, q/Artikel\s*[0- +9IVX]/, q/Artikels\s*[0-9IVX]/, q/Artikeln\s*[0-9IVX]/);

    Bear in mind that these are now just simple strings so you need to take care to explicitly use them in a regex content. But as that is what the rest of your code does anyway, there is no further change required here.

    Would it be possible to save this output to a file?

    Of course. See eg. Re: How do I write to a file?

    Do have a browse through the Tutorials section here and the Getting Started with Perl section in particular. These should help you achieve some of these simple tasks while you become more familiar with the language.


    🦛

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11162657]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2025-02-15 13:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found