Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

What does 'next if $hash{$elem}++;' mean?

by Win (Novice)
on Feb 17, 2006 at 15:21 UTC ( #530966=perlquestion: print w/ replies, xml ) Need Help??
Win has asked for the wisdom of the Perl Monks concerning the following question:

Please could somebody put the following line into English.
next if $seen{ $elem }++;
in the code:
my %hash = map { $_, 1 } @array; # or a hash slice: @hash{ @array } = (); # or a foreach: $hash{$_} = 1 foreach ( @array ); my @unique = keys %hash; my @unique = (); my %seen = (); foreach my $elem ( @array ) { next if $seen{ $elem }++; push @unique, $elem; }
My attempt is:

Insert $elem into the hash 'seen' and skip to the next loop event if this $elem is already present in the hash 'seen'.

I realise that my attempt is not quite there.

Comment on What does 'next if $hash{$elem}++;' mean?
Select or Download Code
Re: What does 'next if $hash{$elem}++;' mean?
by dragonchild (Archbishop) on Feb 17, 2006 at 15:32 UTC
    1. If a key doesn't exist in a hashref before it's used, it's created for you automatically. This is called autovivification.
    2. The postfix increment operator (++) will increment the value of the thing it's after, then return the old value.
    3. next will skip to the next iteration of the loop.

    So, this code will skip to the next iteration of the loop if it's already seen that element. Just as if you'd read the code out loud. :-)


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      The reason I ask the question is because the following subroutine does not work.
      sub Remove_duplicate_lines { my ($processed_file, $out_file) = @_; open (PROCESSED_FILE, "<$processed_file"); open (OUTFILE, "+>$out_file"); my @array; my @unique; while (<PROCESSED_FILE>) { chomp; push (@array, $_); } my %hash = map { $_, 1 } @array; # or a hash slice: @hash{ @array } = (); # or a foreach: $hash{$_} = 1 foreach ( @array ); my @unique = keys %hash; my @unique = (); my %seen = (); foreach my $elem ( @array ) { next if $seen{ $elem }++; push @unique, $elem; } foreach (@unique){ print OUTFILE "$_"; } }

        You'll need to define "doesn't work" (aside from the obvious problems of not checking the return value from your opens, or redefining @unique, or not printing newlines in the last foreach, or that you seem to be doing redundant work with both the map { $_, 1 } @array and the subsequent foreach).

        It isn't necessary to do all of the intermediate processing. You can read a line and check it all at the same time.

        The line is used as a hash key. The value is tested before being incremented and the line is added to the array.

        my %seen; while (<PROCESSED_FILE>){ push @unique, $_ unless $seen{$_}++; }

        This is OK if the hash doesn't grow to big. Using MD5 hashes of lines is an uesful technique.

        use List::MoreUtils qw( uniq ); sub Remove_duplicate_lines { my ($processed_file, $out_file) = @_; open (PROCESSED_FILE, "<$processed_file"); chomp( my @array = <PROCESSED_FILE> ); close PROCESSED_FILE; my @unique = uniq @array; open (OUTFILE, "+>$out_file"); print OUTFILE join( '', @array ); close OUTFILE; }

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        # Simple enough?!? sub remove_duplicate_lines { my ($infile, $outfile)=@_; open my $in, '<', $infile or die "$infile: $!\n"; open my $out, '>', $outfile or die "$outfile: $!\n"; my $oldout=select $out; my %saw; $saw{$_}++ or print while <$in>; select $oldout; }
        Chaps,

        I had a go benchmarking various ways of pulling unique values out of a list which I had seen in various text books. It looks like a hash slice is the quickest on my ancient hardware. Your mileage may vary.

        #!/usr/local/bin/perl # use warnings; use strict; use Benchmark; our @data = <DATA>; chomp @data; our $rcHash = sub { my %seen = (); $seen{$_} ++ for @data; return keys %seen; }; our $rcHashGrep = sub { my %seen = (); return grep {! $seen{$_} ++} @data; }; our $rcHashSlice = sub { my %uniq; @uniq{@data} = (); return keys %uniq; }; our $rcListHash = sub { my %seen = (); my @uniq = (); foreach my $item (@data) { push @uniq, $item unless $seen{$item} ++; } return @uniq; }; our $rcMapHash = sub { return keys %{{map {$_ => 1} @data}}; }; timethese(5000, { Hash => $rcHash, HashGrep => $rcHashGrep, HashSlice => $rcHashSlice, ListHash => $rcListHash, MapHash => $rcMapHash}); __END__ red blue yellow green black white purple mauve pink grey violet black white blue green red mauve violet black red blue yellow green black white purple mauve pink grey violet black white blue green red mauve violet black mauve violet black red blue yellow green black violet black red blue yellow mauve pink grey violet black white blue green yellow green black iolet black red green black white purple mauve pink yellow green black violet black red blue yellow mauve pink grey violet black white blue green

        Produces the following metrics.

        Benchmark: timing 5000 iterations of Hash, HashGrep, HashSlice, ListHash, MapHash...
        Hash: 4 wallclock secs ( 3.48 usr + 0.00 sys = 3.48 CPU) @ 1436.78/s (n=5000)
        HashGrep: 4 wallclock secs ( 3.29 usr + 0.00 sys = 3.29 CPU) @ 1519.76/s (n=5000)
        HashSlice: 1 wallclock secs ( 1.16 usr + 0.00 sys = 1.16 CPU) @ 4310.34/s (n=5000)
        ListHash: 5 wallclock secs ( 5.03 usr + 0.00 sys = 5.03 CPU) @ 994.04/s (n=5000)
        MapHash: 6 wallclock secs ( 5.89 usr + 0.00 sys = 5.89 CPU) @ 848.90/s (n=5000)

        Cheers,

        JohnGG

Re: What does 'next if $hash{$elem}++;' mean?
by blazar (Canon) on Feb 17, 2006 at 17:52 UTC
    Please could somebody put the following line into English.
    next if $seen{ $elem }++;

    [snip]

    My attempt is:
    Insert $elem into the hash 'seen' and skip to the next loop event if this $elem is already present in the hash 'seen'.

    How 'bout exectly the way you read it, i.e. "skip to the next iteration of the loop if you have already seen $elem"? That is "if the counter associated to $elem (that is the value of the hash %seen on the key $elem), which you're (post-)incrementing (++) at this iteration too, is greater than zero".

    Was there anything particular difficult to understand in this reply?

Re: What does 'next if $hash{$elem}++;' mean?
by radiantmatrix (Parson) on Feb 17, 2006 at 19:24 UTC

    next if $seen{ $elem }++;

    "Skip to the next iteration of the loop if $seen{ $elem } is true, otherwise make $seen{ $elem } true and continue."

    Check out the segment on the unary '++' operator in perlop. When the operator follows a variable, it is incremented after its evaluation. In other words, in the above code, if checks the value of $seen{ $elem } before incrementing it. Since the action "next" is taken before the increment happens, the increment is skipped. Here's longer (and probably slightly slower) equivalent code:

    if ( $seen{ $elem } ) { next; # skip to the next loop iteration } else { $seen{ $elem } += 1; }

    This is a common pattern that checks if you've encountered $elem before. If you have, it will not process it a second time; otherwise, it will mark $elem so that you won't process it again.

    This could be used to get all the unique lines from a file, for example.

    use strict; use warnings; open my $IN, '<', $ARGV[0] or die("Can't read $ARGV[0]: $!"); open my $OUT, '<', $ARGV[1] or die("Can't write $ARGV[1]: $!"); my %seen; while (<$IN>) { next if $seen{$_}++; # check/mark line as seen print $OUT $_; } close $IN; close $OUT;

    Or, even shorter while loop:

    while (<$IN>) { $seen{$_}++ || print $OUT $_ }
    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://530966]
Approved by friedo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (14)
As of 2014-07-31 18:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (251 votes), past polls