Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: duplicate lines in array

by Wonko the sane (Deacon)
on Jan 23, 2004 at 12:51 UTC ( #323540=note: print w/ replies, xml ) Need Help??


in reply to duplicate lines in array

There are of course many other ways to do this. Here is one.

#!/usr/local/bin/perl5.6.0 -w use strict; # array of records my @contact_type = ( 'N(8) -- H(15) .. O(9)', 'N(8) -- H(15) .. N(8)', 'N(8) -- H(16) .. O(9)', ); my %count_hash; # used to store matching lines for ( map { [ /(H\(\d+\))/, $_ ] } @contact_type ) { push( @{$count_hash{$_->[0]}}, $_->[1] ); } for ( keys %count_hash ) { if ( scalar( @{$count_hash{$_}} ) > 1 ) { print qq{$_\n} for ( @{$count_hash{$_}} ); } }

Output:
:!./test.pl N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8)
Wonko


Comment on Re: duplicate lines in array
Select or Download Code
Replies are listed 'Best First'.
Re: Re: duplicate lines in array
by Roger (Parson) on Jan 23, 2004 at 14:06 UTC
    A bit of golfing. :-)

    use strict; use warnings; my @rec = ( 'N(8) -- H(15) .. O(9)', 'N(8) -- H(15) .. N(8)', 'N(8) -- H(16) .. O(9)', ); my %count; for (@rec) { /(H\(\d+\))/ && do { push @{$count{$1}}, $_ } } for (keys %count) { if ($#{$count{$_}}) { print "$_\n" for @{$count{$_}} } } __END__ # another go... $,="\n"; for (keys %count) { print @{$count{$_}} if $#{$count{$_}} }


      If you want to golf you may increase yout handicap by shortening
      for (@rec) { /(H\(\d+\))/ && do { push @{$count{$1}}, $_ } } for (keys %count) { if ($#{$count{$_}}) { print "$_\n" for @{$count{$_}} } }
      to just
      /(H\(\d+\))/ && push(@{$count{$1}},$_."\n") for (@rec); print map @{$count{$_}},grep $#{$count{$_}},keys %count;
      Anyone volunteering to explain my code?
        Another go... ;-)

        /(H\(\d+\))/,push@{$count{$1}},"$_\n"for@rec; print map{$#{$count{$_}}?@{$count{$_}}:''}keys%count;

        Bah, you call that golfing? :) You used multi-letter variable names and more than one space.

        /(H\(\d+\))/&&push@{$c{$1}},"$_\n"for@r; print map@{$c{$_}},grep$#{$c{$_}},keys%c;
Re: Re: duplicate lines in array
by mcogan1966 (Monk) on Jan 23, 2004 at 13:24 UTC
    Hashes have very fast processing speeds. Using them for finding unique values of an array like this will work nicely, and very fast even over larger files.

    I recently had to do something similar for a project, and the hash solution worked nicely, and the speed was blazing in comparison to my original 'nested looping' solution.

Re: Re: duplicate lines in array
by harry34 (Sexton) on Jan 27, 2004 at 10:44 UTC
    Thanks this is working great but only for 2 files. I think it has something to do with the line push( @{$count_hash{$_->[0]}}, $_->[1] ); Is it possible to change this so it does all files present ?
    cheers harry

      try this:
      #!/usr/bin/perl use strict; my (%count, $occ); my @files = ("file1", "file2", "file3"); # read all files sequentially foreach my $file (@files) { open (IN, "<$file") || die "could not open $file\n"; while (<IN>) { chomp; # if line contains specified string add it to 'file' and 'foun +d string' specific array /(H\(\d+\))/ && do { push @{$count{$file}{$1}}, $_ } } close (IN); } # loop over files foreach my $file (keys %count) { print "$file\n"; # loop over found strings foreach my $found (keys %{$count{$file}}) { # count occurences $occ = scalar (@{$count{$file}{$found}}); # print found lines if occured more than 1 time if ($occ > 1) { foreach (@{$count{$file}{$found}}) { print "$_\n"; } } } print "\n"; }

      I had 3 files to test with:
      file1: N(8) -- H(15) .. O(9) N(8) -- H(16) .. N(8) N(8) -- H(16) .. O(9) file2: N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) N(8) -- H(16) .. O(9) file3: N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) N(8) -- H(16) .. O(9)

      and it looks like:
      file1 N(8) -- H(16) .. N(8) N(8) -- H(16) .. O(9) file2 N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) file3 N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8)

      Imre
        That is exactly what I'm trying to achieve. Although I already have all the lines of data stored in @contact_type.
        All I need to do is iterate over @contact_type and check for the defind pattern and print if any are repeated.
        Can the code you have provided be changed to do that ?
        <br.cheers harry

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://323540]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (8)
As of 2015-07-31 05:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (274 votes), past polls