Re: duplicate lines in array
by Abigail-II (Bishop) on Jan 23, 2004 at 11:39 UTC
|
Start with perldoc -q duplicate and adjust.
Abigail | [reply] |
Re: duplicate lines in array
by Wonko the sane (Deacon) on Jan 23, 2004 at 12:51 UTC
|
There are of course many other ways to do this. Here is one.
#!/usr/local/bin/perl5.6.0 -w
use strict;
# array of records
my @contact_type =
(
'N(8) -- H(15) .. O(9)',
'N(8) -- H(15) .. N(8)',
'N(8) -- H(16) .. O(9)',
);
my %count_hash; # used to store matching lines
for ( map { [ /(H\(\d+\))/, $_ ] } @contact_type )
{
push( @{$count_hash{$_->[0]}}, $_->[1] );
}
for ( keys %count_hash )
{
if ( scalar( @{$count_hash{$_}} ) > 1 )
{
print qq{$_\n} for ( @{$count_hash{$_}} );
}
}
Output:
:!./test.pl
N(8) -- H(15) .. O(9)
N(8) -- H(15) .. N(8)
Wonko | [reply] [d/l] [select] |
|
use strict;
use warnings;
my @rec = (
'N(8) -- H(15) .. O(9)',
'N(8) -- H(15) .. N(8)',
'N(8) -- H(16) .. O(9)',
);
my %count;
for (@rec) {
/(H\(\d+\))/ && do { push @{$count{$1}}, $_ }
}
for (keys %count) {
if ($#{$count{$_}}) {
print "$_\n" for @{$count{$_}}
}
}
__END__
# another go...
$,="\n";
for (keys %count) {
print @{$count{$_}} if $#{$count{$_}}
}
| [reply] [d/l] |
|
If you want to golf you may increase yout handicap by shortening
for (@rec) {
/(H\(\d+\))/ && do { push @{$count{$1}}, $_ }
}
for (keys %count) {
if ($#{$count{$_}}) {
print "$_\n" for @{$count{$_}}
}
}
to just
/(H\(\d+\))/ && push(@{$count{$1}},$_."\n") for (@rec);
print map @{$count{$_}},grep $#{$count{$_}},keys %count;
Anyone volunteering to explain my code? | [reply] [d/l] [select] |
|
|
|
|
| [reply] |
|
Thanks this is working great but only for 2 files. I think it has something to do with the line push( @{$count_hash{$_->[0]}}, $_->[1] );
Is it possible to change this so it does all files present ?
cheers harry
| [reply] [d/l] |
|
#!/usr/bin/perl
use strict;
my (%count, $occ);
my @files = ("file1", "file2", "file3");
# read all files sequentially
foreach my $file (@files) {
open (IN, "<$file") || die "could not open $file\n";
while (<IN>) {
chomp;
# if line contains specified string add it to 'file' and 'foun
+d string' specific array
/(H\(\d+\))/ && do { push @{$count{$file}{$1}}, $_ }
}
close (IN);
}
# loop over files
foreach my $file (keys %count) {
print "$file\n";
# loop over found strings
foreach my $found (keys %{$count{$file}}) {
# count occurences
$occ = scalar (@{$count{$file}{$found}});
# print found lines if occured more than 1 time
if ($occ > 1) {
foreach (@{$count{$file}{$found}}) {
print "$_\n";
}
}
}
print "\n";
}
I had 3 files to test with:
file1:
N(8) -- H(15) .. O(9)
N(8) -- H(16) .. N(8)
N(8) -- H(16) .. O(9)
file2:
N(8) -- H(15) .. O(9)
N(8) -- H(15) .. N(8)
N(8) -- H(16) .. O(9)
file3:
N(8) -- H(15) .. O(9)
N(8) -- H(15) .. N(8)
N(8) -- H(16) .. O(9)
and it looks like:
file1
N(8) -- H(16) .. N(8)
N(8) -- H(16) .. O(9)
file2
N(8) -- H(15) .. O(9)
N(8) -- H(15) .. N(8)
file3
N(8) -- H(15) .. O(9)
N(8) -- H(15) .. N(8)
Imre
| [reply] [d/l] [select] |
|
|
|
Re: duplicate lines in array
by pelagic (Priest) on Jan 23, 2004 at 12:28 UTC
|
Question:
do you put the lines identifying the files:
file 2:
also into your array?
Imre | [reply] [d/l] |
|
No the array contains data only. The data from each file is separated by a new line.
cheers harry
| [reply] |
Re: duplicate lines in array
by hmerrill (Friar) on Jan 23, 2004 at 12:38 UTC
|
Your question needs to be a little clearer - would you also want H(16) to be printed, since it occurs both in file 1 and file 2? | [reply] |
|
I wish to compare data within each seperate file only.
I do not want to compare one file with another.
So in file 2 only the line containing H(15) will be displayed, i.e. 2 lines.
If a file contained H(16) or H(20) twice or more then those lines should also be printed out.
Has to be writen generally since numbers can change.
cheers Harry
| [reply] |