Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

duplicate lines in array

by harry34 (Sexton)
on Jan 23, 2004 at 11:31 UTC ( [id://323513]=perlquestion: print w/replies, xml ) Need Help??

harry34 has asked for the wisdom of the Perl Monks concerning the following question:

I have 30 seperate files which are opened individually and from each file infomation is extracted,
e.g for files 1 and 2:

file 1:
N(8) -- H(15) .. O(9)
N(8) -- H(16) .. O(9)

file 2:
N(8) -- H(15) .. O(9)
N(8) -- H(15) .. N(8)
N(8) -- H(16) .. O(9)

Note all the information from all 30 files is stored in an array called @contact_type.
What I want to do is go through @contact_type and list the lines which contain the pattern /H\(d+\)/ more then once.
So iterating over the information from file 1 should not output any lines, but from file 2 both the H(15) lines should be displayed as H(15) is repeated in both.
Note I wish to write this as general as possible as letters and numbers can change.

Cheers Harry

Replies are listed 'Best First'.
Re: duplicate lines in array
by Abigail-II (Bishop) on Jan 23, 2004 at 11:39 UTC
    Start with perldoc -q duplicate and adjust.

    Abigail

Re: duplicate lines in array
by Wonko the sane (Deacon) on Jan 23, 2004 at 12:51 UTC
    There are of course many other ways to do this. Here is one.

    #!/usr/local/bin/perl5.6.0 -w use strict; # array of records my @contact_type = ( 'N(8) -- H(15) .. O(9)', 'N(8) -- H(15) .. N(8)', 'N(8) -- H(16) .. O(9)', ); my %count_hash; # used to store matching lines for ( map { [ /(H\(\d+\))/, $_ ] } @contact_type ) { push( @{$count_hash{$_->[0]}}, $_->[1] ); } for ( keys %count_hash ) { if ( scalar( @{$count_hash{$_}} ) > 1 ) { print qq{$_\n} for ( @{$count_hash{$_}} ); } }

    Output:
    :!./test.pl N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8)
    Wonko
      A bit of golfing. :-)

      use strict; use warnings; my @rec = ( 'N(8) -- H(15) .. O(9)', 'N(8) -- H(15) .. N(8)', 'N(8) -- H(16) .. O(9)', ); my %count; for (@rec) { /(H\(\d+\))/ && do { push @{$count{$1}}, $_ } } for (keys %count) { if ($#{$count{$_}}) { print "$_\n" for @{$count{$_}} } } __END__ # another go... $,="\n"; for (keys %count) { print @{$count{$_}} if $#{$count{$_}} }


        If you want to golf you may increase yout handicap by shortening
        for (@rec) { /(H\(\d+\))/ && do { push @{$count{$1}}, $_ } } for (keys %count) { if ($#{$count{$_}}) { print "$_\n" for @{$count{$_}} } }
        to just
        /(H\(\d+\))/ && push(@{$count{$1}},$_."\n") for (@rec); print map @{$count{$_}},grep $#{$count{$_}},keys %count;
        Anyone volunteering to explain my code?
      Hashes have very fast processing speeds. Using them for finding unique values of an array like this will work nicely, and very fast even over larger files.

      I recently had to do something similar for a project, and the hash solution worked nicely, and the speed was blazing in comparison to my original 'nested looping' solution.

      Thanks this is working great but only for 2 files. I think it has something to do with the line push( @{$count_hash{$_->[0]}}, $_->[1] ); Is it possible to change this so it does all files present ?
      cheers harry

        try this:
        #!/usr/bin/perl use strict; my (%count, $occ); my @files = ("file1", "file2", "file3"); # read all files sequentially foreach my $file (@files) { open (IN, "<$file") || die "could not open $file\n"; while (<IN>) { chomp; # if line contains specified string add it to 'file' and 'foun +d string' specific array /(H\(\d+\))/ && do { push @{$count{$file}{$1}}, $_ } } close (IN); } # loop over files foreach my $file (keys %count) { print "$file\n"; # loop over found strings foreach my $found (keys %{$count{$file}}) { # count occurences $occ = scalar (@{$count{$file}{$found}}); # print found lines if occured more than 1 time if ($occ > 1) { foreach (@{$count{$file}{$found}}) { print "$_\n"; } } } print "\n"; }

        I had 3 files to test with:
        file1: N(8) -- H(15) .. O(9) N(8) -- H(16) .. N(8) N(8) -- H(16) .. O(9) file2: N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) N(8) -- H(16) .. O(9) file3: N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) N(8) -- H(16) .. O(9)

        and it looks like:
        file1 N(8) -- H(16) .. N(8) N(8) -- H(16) .. O(9) file2 N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8) file3 N(8) -- H(15) .. O(9) N(8) -- H(15) .. N(8)

        Imre
Re: duplicate lines in array
by pelagic (Priest) on Jan 23, 2004 at 12:28 UTC
    Question: do you put the lines identifying the files:
    file 2:
    also into your array?
    Imre
      No the array contains data only. The data from each file is separated by a new line.

      cheers harry
Re: duplicate lines in array
by hmerrill (Friar) on Jan 23, 2004 at 12:38 UTC
    Your question needs to be a little clearer - would you also want H(16) to be printed, since it occurs both in file 1 and file 2?
      I wish to compare data within each seperate file only.
      I do not want to compare one file with another.
      So in file 2 only the line containing H(15) will be displayed, i.e. 2 lines.
      If a file contained H(16) or H(20) twice or more then those lines should also be printed out.
      Has to be writen generally since numbers can change.

      cheers Harry

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://323513]
Approved by mpolo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2024-04-19 12:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found