Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

How can I keep the first occurrence from duplicated strings?

by Anonymous Monk
on Aug 29, 2023 at 21:01 UTC ( [id://11154123]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
if you have a file with lines like the following:
nick 5 nick 10 nick 20 john 78 erik 9 erik 12

Can you tell me how I can keep only the first occurence of each name? If I do a hash for instance, I will keep nick 20, but I want to keep nick 5 instead. How do I do that?

Replies are listed 'Best First'.
Re: How can I keep the first occurrence from duplicated strings?
by choroba (Cardinal) on Aug 29, 2023 at 22:03 UTC
    You can still use a hash, but you have to check whether the key already exists before setting the value.
    #!/usr/bin/perl use warnings; use strict; my %seen; while (<DATA>) { my ($name, $count) = split; $seen{$name} = $count unless exists $seen{$name}; } print "$_ $seen{$_}\n" for keys %seen; __DATA__ nick 5 nick 10 nick 20 john 78 erik 9 erik 12

    If you want to store the values in an array, use

    push @keep, $_ unless $seen{$name}++;

    Or, you can print the lines directly:

    print unless $seen{$name}++;

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: How can I keep the first occurrence from duplicated strings?
by GrandFather (Saint) on Aug 29, 2023 at 22:18 UTC

    The defined or assignment operator only assigns the right hand value if the value to the left is undefined so you can use that to assign just the first value found to a hash:

    use strict; use warnings; my $input = <<IN; nick 5 nick 10 nick 20 john 78 erik 9 erik 12 IN my %hits; open my $fin, '<', \$input; while (my $line = <$fin>) { next if $line !~ /^(\w+)\s+(\d+)/; $hits{$1} //= $2; } print "$_: $hits{$_}\n" for sort keys %hits;

    Prints:

    erik: 9 john: 78 nick: 5
    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: How can I keep the first occurrence from duplicated strings?
by tybalt89 (Monsignor) on Aug 29, 2023 at 21:18 UTC
    #!/usr/bin/perl use strict; # https://www.perlmonks.org/?node_id=11154123 use warnings; use List::AllUtils qw( uniq_by ); open my $fh, '<', \<<END or die; nick 5 nick 10 nick 20 john 78 erik 9 erik 12 END my @firstofeachname = uniq_by { (split)[0] } <$fh>; print @firstofeachname;

    Outputs:

    nick 5 john 78 erik 9
Re: How can I keep the first occurrence from duplicated strings?
by tybalt89 (Monsignor) on Aug 29, 2023 at 22:22 UTC

    Using hash with //=

    #!/usr/bin/perl use strict; # https://www.perlmonks.org/?node_id=11154123 use warnings; open my $fh, '<', \<<END or die; nick 5 nick 10 nick 20 john 78 erik 9 erik 12 END my %hash; $hash{ (split)[0] } //= $_ while <$fh>; print values %hash;

    Outputs(order varies by run because of perl's hash randomizing):

    nick 5 erik 9 john 78
Re: How can I keep the first occurrence from duplicated strings?
by eyepopslikeamosquito (Archbishop) on Aug 29, 2023 at 23:38 UTC

    For fun, a more verbose version that you might find easier to understand or maintain:

    # print-first.pl. See Perl Monks [id://11154123]. use strict; use warnings; my $fname = shift or die "usage: $0 file\n"; open( my $fh, '<', $fname ) or die "error: open '$fname': $!"; my %hash; my $line; while ( defined($line = <$fh>) ) { chomp $line; $line =~ s/^\s+//; # remove leading $line =~ s/\s+$//; # and trailing whitespace next unless length $line; # ignore empty lines next if $line =~ /^#/; # ignore comment lines next unless $line =~ /^(\w+)\s+(\d+)$/; # ignore lines that do no +t match my $key = $1; my $val = $2; next if exists $hash{$key}; # ignore if already seen ++$hash{$key}; print "$key $val\n"; } close $fh;

    Example run:

    $ cat t1.txt nick 5 nick 10 nick 20 john 78 erik 9 erik 12

    $ perl print-first.pl usage: print-first.pl file $ perl print-first.pl t1.txt nick 5 john 78 erik 9

    See Also

    Updated: Added "See Also" section

Re: How can I keep the first occurrence from duplicated strings?
by Bod (Parson) on Aug 29, 2023 at 23:19 UTC

    You could just reverse the order of the lines in the file...

    use strict; use warnings; my @lines = reverse(<DATA>); my %test; foreach (@lines) { my ($name, $number) = split / /; $test{$name} = $number; } print %test; __DATA__ nick 5 nick 10 nick 20 john 78 erik 9 erik 12

    This gives the result...

    nick5 john78 erik9

      You could just reverse the order of the lines in the file...

      use strict; use warnings; my @lines = reverse(<DATA>);

      This will read the entire file into RAM. No problem for 100 kBytes, big trouble for big files (larger than free RAM). The solutions from choroba, Grandfather, eyepopslikeamosquito, and the second solution from tybalt89 do not suffer from that problem, because they all read only one line at a time.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        See also File::ReadBackwards (however, its documentation doesn't mention file encodings, so it might blow up on UTF-8).

        This will read the entire file into RAM. No problem for 100 kBytes, big trouble for big files

        Yes of course that's true.

        But given the apparent nature of the data, I feel it's safe to assume that the file size will be small relative to available RAM and paging files. If it were more than a few lines then processing it using Perl is almost certainly the wrong approach. For a file large enough to be a problem, Perl should be reading in one line at a time and loading it into a database when the desired result of getting the first occurrence becomes trivial.

        So for a big huge file, the question needs asking on SQLMonks (wishful thinking...)

Re: How can I keep the first occurrence from duplicated strings? -- oneliner
by Discipulus (Canon) on Aug 30, 2023 at 07:58 UTC
    Hello,

    many variations but at glance it seems to me that this is missing: use a hash and put vualues into an array (choroba suggested this..).

    Being an opportunity to produce a oneliner, I cannot resist to :)

    cat uniqfirst.txt nick 5 nick 10 nick 20 john 78 erik 9 erik 12 perl -lanE "push@{$r{$F[0]}},$F[1]}{say join' ',$_,$r{$_}[0]for keys%r +" uniqfirst.txt erik 9 nick 5 john 78 perl -MO=Deparse -lanE "push@{$r{$F[0]}},$F[1]}{say join' ',$_,$r{$_}[ +0]for keys%r" BEGIN { $/ = "\n"; $\ = "\n"; } use feature 'current_sub', 'evalbytes', 'fc', 'postderef_qq', 'say', ' +state', 'switch', 'unicode_strings', 'unicode_eval'; LINE: while (defined($_ = readline ARGV)) { chomp $_; our @F = split(' ', $_, 0); push @{$r{$F[0]};}, $F[1]; } { say join(' ', $_, $r{$_}[0]) foreach (keys %r); } -e syntax OK

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11154123]
Approved by GrandFather
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2025-06-17 07:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.