http://www.perlmonks.org?node_id=11142055

almsdealer has asked for the wisdom of the Perl Monks concerning the following question:

Greetings. I am baffled by a problem in my code below. I am trying to remove duplicate entries from @fileTags on line 29, but some non-duplicate items are also being dropped.

#!/usr/bin/perl use strict; use warnings; sub myTrim { my ($str) = @_; my ($trimmed) = ($str =~ /\s*(.*)\s*/); return $trimmed; } my @allTags = (); foreach my $file (<[0-9][0-9][0-9][0-9].txt>) { my @fileTags = (); open(my $handle, '<', $file) or die 'couldn\'t open file'; while (my $line = <$handle>) { if ($line =~ /^tags/) { my ($tagsString) = ($line =~ /^tags\s+(.*)/); @fileTags = (@fileTags, split(/,/, $tagsString)); } } close($handle); # some of the items in @fileTags disappear (not just duplicate ite +ms) my %uniqueTags = map { myTrim $_, 1 } @fileTags; @allTags = (@allTags, keys %uniqueTags); } print "@allTags\n";

Replies are listed 'Best First'.
Re: Mapping list to hash is dropping list items
by haukex (Archbishop) on Mar 12, 2022 at 21:44 UTC

    I can't test right now but try map { myTrim($_), 1 }, because as you have written it, your code is being interpreted as map { myTrim($_, 1) }.

      That was it it! Thank you!

Re: Mapping list to hash is dropping list items
by eyepopslikeamosquito (Archbishop) on Mar 13, 2022 at 11:18 UTC

    G'day almsdealer,

    Welcome to the Monastery.

    Given you're very new to Perl, you've made a great start! Though you're problem has already been solved (by the redoubtable haukex) I'd like to offer a couple of tips to help you on your Perl journey:

    • Always use parens when calling user-defined subroutines (see here for why).
    • Use Perl's core B::Deparse module to see how Perl parses your script.

    For example, with your script saved as almsdealer.pl, running:

    perl -MO=Deparse,-p almsdealer.pl >deparse.pl
    allows you to see how Perl parses your script, in your case deparse.pl contains:
    use strict; use warnings; sub myTrim { (my($str) = @_); (my($trimmed) = ($str =~ /\s*(.*)\s*/)); (return $trimmed); } (my(@allTags) = ()); foreach my $file (glob('[0-9][0-9][0-9][0-9].txt')) { use File::Glob (); (my(@fileTags) = ()); (open(my $handle, '<', $file) or die(q[couldn't open file])); while (defined((my $line = readline($handle)))) { do { if (($line =~ /^tags/)) { (my($tagsString) = ($line =~ /^tags\s+(.*)/)); (@fileTags = (@fileTags, split(/,/, $tagsString, 0))); } }; } close($handle); (my(%uniqueTags) = map({myTrim($_, 1);} @fileTags)); (@allTags = (@allTags, keys(%uniqueTags))); } print(("@allTags\n"));

    Hopefully, seeing myTrim($_, 1) will set alarm bells ringing because your myTrim function takes one argument, not two.

    As a matter of personal style, I would write your:

    my ($str) = @_;
    as:
    my $str = shift; # the string to be trimmed
    because I like to put a comment next to each parameter describing what it does, plus this style scales nicely for subroutines that take more than one argument (a random example of this style can be found here).

      Wow, thank you. What a wealth of information I've gained just from all the replies in this thread!
Re: Mapping list to hash is dropping list items
by jwkrahn (Abbot) on Mar 12, 2022 at 23:27 UTC

    This may work better (UNTESTED):

    #!/usr/bin/perl use strict; use warnings; my @allTags; { local @ARGV = <[0-9][0-9][0-9][0-9].txt>; my %uniqueTags; while ( my $line = <> ) { next unless $line =~ s/^tags\s+//; $line =~ s/\s+\z//; $uniqueTags{ $_ } = 1 for split /\s+,\s+/, $line; } @allTags = keys %uniqueTags; } print "@allTags\n";
      If I understand correctly, <> will read from @ARGV unless it is empty in which case it will read from STDIN?

      Also, why did you wrap everything between my @alTags and print in a block?

        When used like this the diamond operator is actually glob which generates a list of filenames; those filenames are used to populate @ARGV and (yes) then the plain <> readline version will implicitly read from those files. The block was (my guess) probably intended to scope the local change to @ARGV to just that section of code.

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

Re: Mapping list to hash is dropping list items
by Marshall (Canon) on Mar 12, 2022 at 22:28 UTC
    perhaps - untested -:
    #!/usr/bin/perl use strict; use warnings; use List::Util qw (uniq); # I think this is the fastest way for the trim functionality... # 2 statements benchmark faster than one statement sub myTrim { my $str = shift; #shift faster than @_ for one paramater $str =~ s/^\s*//; #no space at front $str =~ s/\s*$//; #no space at rear return $str; } my @fileTags = map{myTrim($_)}@fileTags. @fileTags = uniq @fileTags;
    Update: I did run this:
    Your Trim function does not trim the trailing spaces because the match (.*) is greedy...
    #!/usr/bin/perl use strict; use warnings; use List::Util qw (uniq); sub myTrim { my ($str) = @_; my ($trimmed) = ($str =~ /\s*(.*)\s*/); return $trimmed; } my $x = "x "; $x = myTrim($x); print "$x!\n"; #prints "x !"
    Update: The single statement version of your MyTrim() would be str =~ s/^\s*|\s*$//g; I saw one benchmark where that appeared to be faster than the "standard" 2 statements, but I was never able to re-create that result myself. Perl regex performance can vary significantly between releases - in general things get faster - but hiccups do happen.