http://www.perlmonks.org?node_id=323540


in reply to duplicate lines in array

There are of course many other ways to do this. Here is one.

#!/usr/local/bin/perl5.6.0 -w use strict; # array of records my @contact_type = ( 'N(8) -- H(15) .. O(9)', 'N(8) -- H(15) .. N(8)', 'N(8) -- H(16) .. O(9)', ); my %count_hash; # used to store matching lines for ( map { [ /(H\(\d+\))/, $_ ] } @contact_type ) { push( @{$count_hash{$_->[0]}}, $_->[1] ); } for ( keys %count_hash ) { if ( scalar( @{$count_hash{$_}} ) > 1 ) { print qq{$_\n} for ( @{$count_hash{$_}} ); } }

Output:
:!./test.pl N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8)
Wonko

Replies are listed 'Best First'.
Re: Re: duplicate lines in array
by Roger (Parson) on Jan 23, 2004 at 14:06 UTC
    A bit of golfing. :-)

    use strict; use warnings; my @rec = ( 'N(8) -- H(15) .. O(9)', 'N(8) -- H(15) .. N(8)', 'N(8) -- H(16) .. O(9)', ); my %count; for (@rec) { /(H\(\d+\))/ && do { push @{$count{$1}}, $_ } } for (keys %count) { if ($#{$count{$_}}) { print "$_\n" for @{$count{$_}} } } __END__ # another go... $,="\n"; for (keys %count) { print @{$count{$_}} if $#{$count{$_}} }


      If you want to golf you may increase yout handicap by shortening
      for (@rec) { /(H\(\d+\))/ && do { push @{$count{$1}}, $_ } } for (keys %count) { if ($#{$count{$_}}) { print "$_\n" for @{$count{$_}} } }
      to just
      /(H\(\d+\))/ && push(@{$count{$1}},$_."\n") for (@rec); print map @{$count{$_}},grep $#{$count{$_}},keys %count;
      Anyone volunteering to explain my code?
        Another go... ;-)

        /(H\(\d+\))/,push@{$count{$1}},"$_\n"for@rec; print map{$#{$count{$_}}?@{$count{$_}}:''}keys%count;

        Bah, you call that golfing? :) You used multi-letter variable names and more than one space.

        /(H\(\d+\))/&&push@{$c{$1}},"$_\n"for@r; print map@{$c{$_}},grep$#{$c{$_}},keys%c;
Re: Re: duplicate lines in array
by mcogan1966 (Monk) on Jan 23, 2004 at 13:24 UTC
    Hashes have very fast processing speeds. Using them for finding unique values of an array like this will work nicely, and very fast even over larger files.

    I recently had to do something similar for a project, and the hash solution worked nicely, and the speed was blazing in comparison to my original 'nested looping' solution.

Re: Re: duplicate lines in array
by harry34 (Sexton) on Jan 27, 2004 at 10:44 UTC
    Thanks this is working great but only for 2 files. I think it has something to do with the line push( @{$count_hash{$_->[0]}}, $_->[1] ); Is it possible to change this so it does all files present ?
    cheers harry

      try this:
      #!/usr/bin/perl use strict; my (%count, $occ); my @files = ("file1", "file2", "file3"); # read all files sequentially foreach my $file (@files) { open (IN, "<$file") || die "could not open $file\n"; while (<IN>) { chomp; # if line contains specified string add it to 'file' and 'foun +d string' specific array /(H\(\d+\))/ && do { push @{$count{$file}{$1}}, $_ } } close (IN); } # loop over files foreach my $file (keys %count) { print "$file\n"; # loop over found strings foreach my $found (keys %{$count{$file}}) { # count occurences $occ = scalar (@{$count{$file}{$found}}); # print found lines if occured more than 1 time if ($occ > 1) { foreach (@{$count{$file}{$found}}) { print "$_\n"; } } } print "\n"; }

      I had 3 files to test with:
      file1: N(8) -- H(15) .. O(9) N(8) -- H(16) .. N(8) N(8) -- H(16) .. O(9) file2: N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) N(8) -- H(16) .. O(9) file3: N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) N(8) -- H(16) .. O(9)

      and it looks like:
      file1 N(8) -- H(16) .. N(8) N(8) -- H(16) .. O(9) file2 N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) file3 N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8)

      Imre
        That is exactly what I'm trying to achieve. Although I already have all the lines of data stored in @contact_type.
        All I need to do is iterate over @contact_type and check for the defind pattern and print if any are repeated.
        Can the code you have provided be changed to do that ?
        <br.cheers harry