Maire has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I'm a Perl beginner trying to do something very basic and encountering an issue. I want to find how many times each item in an array appears in a separate .txt document - when an item doesn't appear an error message should be printed. I have cobbled together the code below from various sources and it seemed to work at first glance, but the script only counts one instance of each item.

@ids = ("a", "b", "c", "d", "e", "f", "g",); my %counts; foreach my $id (@ids) { open(FILE, "list.txt" ) || die "couldn't open 2\n"; $/ = undef; while (<FILE>) { if (/$id/g) { $counts{$id}++; } else { print "$id not found\n"; } } close FILE; } use Data::Dumper; print Dumper \%counts;
Sorry I can't work out how to include the contents of list.txt into the above code - I've included it below instead
a b b c c d d d f f f

This script returns:

e not found g not found $VAR1 = { 'f' => 1, 'c' => 1, 'b' => 1, 'd' => 1, 'a' => 1 };
When I'm expecting:
e not found g not found $VAR1 = { 'f' => 3, 'c' => 2, 'b' => 2, 'd' => 3, 'a' => 1 };

My uneducated guess is that it is the way I am reading in list.txt that is 'wiping' the hash, but unfortuantely I have no idea how to fix it. Any tips would be very much appreciated. Thank you in advance!

Replies are listed 'Best First'.
Re: Perl beginner's issue with hash
by choroba (Archbishop) on Apr 23, 2020 at 15:48 UTC
    The reason of the weird result is the setting of this inconspicuous variable:
    $/ = undef;

    It enables the "slurp mode", i.e. <FILE> reads the whole file into $_. But you then check

    if (/$i/g)
    which only runs once.

    You can fix it in several ways:

    1. keep the $/ unchanged. It will read the file line by line and update the variable appropriately.
    2. replace the if by while, it will run the increment for each match
    3. create a regex from all the ids and match all of them at once:
      my $regex = join '|', map quotemeta, @ids; while (<FILE>) { $counts{$1}++ if /($regex)/; }

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thanks so much for this! Just as a quick follow-up, do you have any thoughts why I might be receiving an error when I change the 'if' to a 'while' statement? All I've changed is the 'if' for a 'while', but then the following appears when I try to run:
      syntax error at step_two_redux.pl line 19, near "else" syntax error step_two_redux.pl line 28, near "}" Execution of step_two_redux.pl aborted due to compilation errors.
        All I've changed is the 'if' for a 'while' ...

        If all you've done is change
            if (CONDITION) { ... } else { ... }
        to
            while (CONDITION) { ... } else { ... }
        then yes, this is a syntax error because there is no  while ... else ... statement block structure in Perl. (The only  if statement | statement block I can see in the OPed code is  if (/$id/g) { ... } — is this the one you're referring to?)

        I'm not sure what you were trying to accomplish with this change, so I'm not sure what to advise as the proper change.


        Give a man a fish:  <%-{-{-{-<

Re: Perl beginner's issue with hash
by GrandFather (Saint) on Apr 23, 2020 at 20:26 UTC

    Reading a file multiple times is usually a "code smell" (a bad coding practice). Very often, as in your case, it's because you want to compare all the items in a file against another group of items. In most cases that gives two options:

    1. Get the first group into some data structure (most often a hash or array) then process the second group (usually the lines in a file) one at a time and process them using the preloaded first group values
    2. The first option with group one and two swapped

    Usually the expected size of the two groups determines which option to choose. Store the smaller of the two groups of data in memory. Commonly the data structure will be an array or a hash, although in your case a regular expression as suggested by Haukex is probably the best option.

    The reason for avoiding rereading a file is that doing so is slow. Reading data from a file is likely to be thousands of times slower than "reading" the same data from memory. With modern computers reading even large files into memory is practical.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Perl beginner's issue with hash
by haukex (Bishop) on Apr 23, 2020 at 19:47 UTC

    You might be interested in Building Regex Alternations Dynamically so that you don't have to loop over your IDs.

    use warnings; use strict; use Data::Dump; my @ids = ("a", "b", "c", "d", "e", "f", "g",); my ($regex) = map {qr/$_/} join '|', map {quotemeta} sort { length $b <=> length $a or $a cmp $b } @ids; my %counts = map { $_ => 0 } @ids; while (<DATA>) { if ( my ($id) = /($regex)/ ) { $counts{$id}++; } } dd \%counts; # outputs "{ a => 1, b => 2, c => 2, d => 3, e => 0, f => 3, g => 0 }" __DATA__ a b b NO ID HERE c c d d d f f f
Re: Perl beginner's issue with hash
by BillKSmith (Prior) on Apr 24, 2020 at 03:04 UTC
    Read the entire file into an array rather than opening and reading it inside the loop. Use grep in scalar context to do the counting.
    use strict; use warnings; use Data::Dumper; my $file = \"a\nb\nb\nc\nc\nd\nd\nd\nf\nf\nf"; # In memory 'file' my @ids = qw( a b c d e f g ); open my $FILE, '<', $file or die "cannot open input file"; my @lines = <$FILE>; # Read entire file into an array close $FILE; my %counts; ID: foreach my $id (@ids) { my $count = grep { /$id/ } @lines; if ($count) { $counts{$id} = $count; next ID; } warn "$id not found\n"; } print Dumper \%counts;
    OUTPUT: e not found g not found $VAR1 = { 'd' => 3, 'a' => 1, 'c' => 2, 'f' => 3, 'b' => 2 };
    Bill