Re: duplicate lines in array

There are of course many other ways to do this. Here is one.

#!/usr/local/bin/perl5.6.0 -w
use strict;

# array of records
my @contact_type =
(
    'N(8) -- H(15) .. O(9)',
    'N(8) -- H(15) .. N(8)',
    'N(8) -- H(16) .. O(9)',
);


my %count_hash;  # used to store matching lines

for (  map { [ /(H\(\d+\))/, $_ ] } @contact_type  )
{
    push( @{$count_hash{$_->[0]}}, $_->[1] );
}


for ( keys %count_hash )
{
    if (  scalar( @{$count_hash{$_}} ) > 1  )
    {
        print qq{$_\n} for ( @{$count_hash{$_}} );
    }
}
[download]

Output:

:!./test.pl
N(8) -- H(15) .. O(9)
N(8) -- H(15) .. N(8)
[download]

Wonko

Comment on Re: duplicate lines in array Select or Download Code

Replies are listed 'Best First'.
Re: Re: duplicate lines in array by Roger (Parson) on Jan 23, 2004 at 14:06 UTC
A bit of golfing. :-) `use strict; use warnings; my @rec = ( 'N(8) -- H(15) .. O(9)', 'N(8) -- H(15) .. N(8)', 'N(8) -- H(16) .. O(9)', ); my %count; for (@rec) { /(H$\d+$)/ && do { push @{$count{$1}}, $_ } } for (keys %count) { if ($#{$count{$_}}) { print "$_\n" for @{$count{$_}} } } __END__ # another go... $,="\n"; for (keys %count) { print @{$count{$_}} if $#{$count{$_}} }` [download]	[reply] [d/l]
Re: Re: Re: duplicate lines in array by Skeeve (Parson) on Jan 23, 2004 at 14:30 UTC
If you want to golf you may increase yout handicap by shortening `for (@rec) { /(H$\d+$)/ && do { push @{$count{$1}}, $_ } } for (keys %count) { if ($#{$count{$_}}) { print "$_\n" for @{$count{$_}} } }` [download] to just `/(H$\d+$)/ && push(@{$count{$1}},$_."\n") for (@rec); print map @{$count{$_}},grep $#{$count{$_}},keys %count;` [download] Anyone volunteering to explain my code?	[reply] [d/l] [select]
Re: Re: Re: Re: duplicate lines in array by Roger (Parson) on Jan 23, 2004 at 14:43 UTC
Another go... ;-) `/(H$\d+$)/,push@{$count{$1}},"$_\n"for@rec; print map{$#{$count{$_}}?@{$count{$_}}:''}keys%count;` [download]	[reply] [d/l]
Re: Re: Re: Re: duplicate lines in array by Fletch (Bishop) on Jan 23, 2004 at 15:01 UTC
Bah, you call that golfing? :) You used multi-letter variable names and more than one space. `/(H$\d+$)/&&push@{$c{$1}},"$_\n"for@r; print map@{$c{$_}},grep$#{$c{$_}},keys%c;` [download]	[reply] [d/l]
Re: Re: Re: Re: Re: duplicate lines in array by Skeeve (Parson) on Jan 23, 2004 at 15:15 UTC
Re: Re: duplicate lines in array by mcogan1966 (Monk) on Jan 23, 2004 at 13:24 UTC
Hashes have very fast processing speeds. Using them for finding unique values of an array like this will work nicely, and very fast even over larger files. I recently had to do something similar for a project, and the hash solution worked nicely, and the speed was blazing in comparison to my original 'nested looping' solution.	[reply]
Re: Re: duplicate lines in array by harry34 (Sexton) on Jan 27, 2004 at 10:44 UTC
Thanks this is working great but only for 2 files. I think it has something to do with the line `push( @{$count_hash{$_->[0]}}, $_->[1] );` Is it possible to change this so it does all files present ? cheers harry	[reply] [d/l]
Re: Re: Re: duplicate lines in array by pelagic (Priest) on Jan 27, 2004 at 11:50 UTC
try this: #!/usr/bin/perl use strict; my (%count, $occ); my @files = ("file1", "file2", "file3"); # read all files sequentially foreach my $file (@files) { open (IN, "<$file") \|\| die "could not open $file\n"; while (<IN>) { chomp; # if line contains specified string add it to 'file' and 'foun +d string' specific array /(H$\d+$)/ && do { push @{$count{$file}{$1}}, $_ } } close (IN); } # loop over files foreach my $file (keys %count) { print "$file\n"; # loop over found strings foreach my $found (keys %{$count{$file}}) { # count occurences $occ = scalar (@{$count{$file}{$found}}); # print found lines if occured more than 1 time if ($occ > 1) { foreach (@{$count{$file}{$found}}) { print "$_\n"; } } } print "\n"; } [download] I had 3 files to test with: `file1: N(8) -- H(15) .. O(9) N(8) -- H(16) .. N(8) N(8) -- H(16) .. O(9) file2: N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) N(8) -- H(16) .. O(9) file3: N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) N(8) -- H(16) .. O(9)` [download] and it looks like: `file1 N(8) -- H(16) .. N(8) N(8) -- H(16) .. O(9) file2 N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) file3 N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8)` [download] Imre	[reply] [d/l] [select]
Re: Re: Re: Re: duplicate lines in array by harry34 (Sexton) on Jan 27, 2004 at 12:39 UTC
That is exactly what I'm trying to achieve. Although I already have all the lines of data stored in @contact_type. All I need to do is iterate over @contact_type and check for the defind pattern and print if any are repeated. Can the code you have provided be changed to do that ? <br.cheers harry	[reply]
(Re:)* duplicate lines in array by pelagic (Priest) on Jan 27, 2004 at 13:00 UTC
Re: (Re:)* duplicate lines in array by harry34 (Sexton) on Jan 27, 2004 at 13:45 UTC
Some notes below your chosen depth have not been shown here

In Section Seekers of Perl Wisdom