Mapping list to hash is dropping list items

almsdealer has asked for the wisdom of the Perl Monks concerning the following question:

Greetings. I am baffled by a problem in my code below. I am trying to remove duplicate entries from @fileTags on line 29, but some non-duplicate items are also being dropped.

#!/usr/bin/perl

use strict;
use warnings;

sub myTrim {
    my ($str) = @_;
    my ($trimmed) = ($str =~ /\s*(.*)\s*/);
    return $trimmed;
}

my @allTags = ();

foreach my $file (<[0-9][0-9][0-9][0-9].txt>) {

    my @fileTags = ();

    open(my $handle, '<', $file) or die 'couldn\'t open file';

    while (my $line = <$handle>) {
        if ($line =~ /^tags/) {
            my ($tagsString) = ($line =~ /^tags\s+(.*)/);
            @fileTags = (@fileTags, split(/,/, $tagsString));
        } 
    }

    close($handle);

    # some of the items in @fileTags disappear (not just duplicate ite
+ms)
    my %uniqueTags = map { myTrim $_, 1 } @fileTags;
    @allTags = (@allTags, keys %uniqueTags);
}

print "@allTags\n";
[download]

Comment on Mapping list to hash is dropping list items Download Code

Replies are listed 'Best First'.
Re: Mapping list to hash is dropping list items by haukex (Archbishop) on Mar 12, 2022 at 21:44 UTC
I can't test right now but try `map { myTrim($_), 1 }`, because as you have written it, your code is being interpreted as `map { myTrim($_, 1) }`.	[reply] [d/l] [select]
Re^2: Mapping list to hash is dropping list items by almsdealer (Acolyte) on Mar 12, 2022 at 22:08 UTC
That was it it! Thank you!	[reply]
Re: Mapping list to hash is dropping list items by eyepopslikeamosquito (Archbishop) on Mar 13, 2022 at 11:18 UTC
G'day almsdealer, Welcome to the Monastery. Given you're very new to Perl, you've made a great start! Though you're problem has already been solved (by the redoubtable haukex) I'd like to offer a couple of tips to help you on your Perl journey: Always use parens when calling user-defined subroutines (see here for why). Use Perl's core B::Deparse module to see how Perl parses your script. For example, with your script saved as `almsdealer.pl`, running: `perl -MO=Deparse,-p almsdealer.pl >deparse.pl` [download] allows you to see how Perl parses your script, in your case `deparse.pl` contains: use strict; use warnings; sub myTrim { (my($str) = @_); (my($trimmed) = ($str =~ /\s(.)\s/)); (return $trimmed); } (my(@allTags) = ()); foreach my $file (glob('[0-9][0-9][0-9][0-9].txt')) { use File::Glob (); (my(@fileTags) = ()); (open(my $handle, '<', $file) or die(q[couldn't open file])); while (defined((my $line = readline($handle)))) { do { if (($line =~ /^tags/)) { (my($tagsString) = ($line =~ /^tags\s+(.)/)); (@fileTags = (@fileTags, split(/,/, $tagsString, 0))); } }; } close($handle); (my(%uniqueTags) = map({myTrim($_, 1);} @fileTags)); (@allTags = (@allTags, keys(%uniqueTags))); } print(("@allTags\n")); [download] Hopefully, seeing `myTrim($_, 1)` will set alarm bells ringing because your `myTrim` function takes one argument, not two. As a matter of personal style, I would write your: `my ($str) = @_;` [download] as: `my $str = shift; # the string to be trimmed` [download] because I like to put a comment next to each parameter describing what it does, plus this style scales nicely for subroutines that take more than one argument (a random example of this style can be found here).	[reply] [d/l] [select]
Re^2: Mapping list to hash is dropping list items by almsdealer (Acolyte) on Mar 13, 2022 at 13:52 UTC
Wow, thank you. What a wealth of information I've gained just from all the replies in this thread!	[reply]
Re: Mapping list to hash is dropping list items by jwkrahn (Abbot) on Mar 12, 2022 at 23:27 UTC
This may work better (UNTESTED): `#!/usr/bin/perl use strict; use warnings; my @allTags; { local @ARGV = <[0-9][0-9][0-9][0-9].txt>; my %uniqueTags; while ( my $line = <> ) { next unless $line =~ s/^tags\s+//; $line =~ s/\s+\z//; $uniqueTags{ $_ } = 1 for split /\s+,\s+/, $line; } @allTags = keys %uniqueTags; } print "@allTags\n";` [download]	[reply] [d/l]
Re^2: Mapping list to hash is dropping list items by almsdealer (Acolyte) on Mar 16, 2022 at 16:28 UTC
If I understand correctly, `<>` will read from `@ARGV` unless it is empty in which case it will read from `STDIN`? Also, why did you wrap everything between `my @alTags` and `print` in a block?	[reply] [d/l] [select]
Re^3: Mapping list to hash is dropping list items by Fletch (Bishop) on Mar 16, 2022 at 17:14 UTC
When used like this the diamond operator is actually glob which generates a list of filenames; those filenames are used to populate `@ARGV` and (yes) then the plain `<>` readline version will implicitly read from those files. The block was (my guess) probably intended to scope the local change to `@ARGV` to just that section of code. The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l] [select]
Re: Mapping list to hash is dropping list items by Marshall (Canon) on Mar 12, 2022 at 22:28 UTC
perhaps - untested -: `#!/usr/bin/perl use strict; use warnings; use List::Util qw (uniq); # I think this is the fastest way for the trim functionality... # 2 statements benchmark faster than one statement sub myTrim { my $str = shift; #shift faster than @_ for one paramater $str =~ s/^\s//; #no space at front $str =~ s/\s$//; #no space at rear return $str; } my @fileTags = map{myTrim($_)}@fileTags. @fileTags = uniq @fileTags;` [download] Update: I did run this: Your Trim function does not trim the trailing spaces because the match (.) is greedy... `#!/usr/bin/perl use strict; use warnings; use List::Util qw (uniq); sub myTrim { my ($str) = @_; my ($trimmed) = ($str =~ /\s(.)\s/); return $trimmed; } my $x = "x "; $x = myTrim($x); print "$x!\n"; #prints "x !"` [download] Update: The single statement version of your MyTrim() would be `str =~ s/^\s\|\s$//g;` I saw one benchmark where that appeared to be faster than the "standard" 2 statements, but I was never able to re-create that result myself. Perl regex performance can vary significantly between releases - in general things get faster - but hiccups do happen.	[reply] [d/l] [select]
Re^2: Mapping list to hash is dropping list items (trim whitespace) by eyepopslikeamosquito (Archbishop) on Mar 13, 2022 at 12:06 UTC
`sub myTrim { my $str = shift; #shift faster than @_ for one paramater $str =~ s/^\s//; #no space at front $str =~ s/\s$//; #no space at rear return $str; }` [download] Trimming leading and trailing whitespace from a string is a FAQ. Like you, I think it's simplest and clearest to do it in two simple statements. To nickpick your solution, it's better to replace `*` with `+` like so: `$str =~ s/^\s+//; # no whitespace at front $str =~ s/\s+$//; # no whitespace at rear` [download] See also: trim leading & trailing whitespace (Perl Monks) Perl Cookbook Perl Maven Re: Is there a way to make these two regex lines cleaner? (new builtin trim function for perl 5.34.1) Code review: function to trim whitespace from all items in a list	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom