Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Matching elements in two arrays and printing the element next to the match.

by spazm (Monk)
on Mar 22, 2010 at 15:34 UTC ( #830101=note: print w/ replies, xml ) Need Help??


in reply to Matching elements in two arrays and printing the element next to the match.

Caveats:

I am not certain of your file formats. Here are my assumptions:
  1. For FILE A, it is unclear if you have multiple ID strings on one line, or one per line over multiple lines. I'll make this solution work with either.
  2. For FILE B, are your lines "key <whitespace> letters" or "key <newline> letters <newline>", ie key and values on separate lines?

Solution:

Perl hashes are very robust and often a great solution for simple to medium complexity problems. For this solution I'll read all the entries from the first file, parse out the IDs, and insert each ID into a hash. Then we will parse each entry in fileB and check if that ID is in the hash we built while walking fileA. In the case of a match, we print the ID and LETTERS joined with a < tab %gt; character.

With this solution, we look at each line of fileA and fileB exactly once, and we use a hash lookup on IDs which is fast. This reduces our complexity from O(n^2)+ from the previous solution to something closer to O(n log n), possibly close to O(n) if we're lucky with our ID hashing.

The Code

#!/usr/bin/perl use warnings; use strict; # open filea and parse all id strings. # Add id strings as keys to %wanted array. my %wanted; { open my $file, '<', "filea" || die "failed to open filea : $!"; while( <$file>) { chomp; @ids = split( /\s+/, $_); $wanted{ $_ }++ for @ids; } close $file; } #read fileb, parse lines of the form "id <whitespace> letters" #and print lines that match the id strings from filea. { open my $file, '<', 'fileb' || die "failed to open fileb : $!"; while (<$file>) { chomp; my ($id, $letters) = split( /\s+/, $_); print "$id\t$letters\n" if $wanted{$id}; } } #OR #read fileb, parse lines of the form "id <newline> letters" #and print lines that match the id strings from filea. { open my $file, '<', 'fileb' || die "failed to open fileb : $!"; while (<$file>) { my $id = $_; my $letters = <$file>; chomp($id); chomp($letters); print "$id\t$letters\n" if $wanted{$id}; } } __END__ FileA: 1DWK 2RFK 4ERH FileB: 1DWK HRSDKKDAHJKLSDLDLLJDGHDFJJE 4ERH DFSKFHADFSBVHFWIHFWJBFS 2RFK DADUHRQWERKBNJAIJDLAJDKAKDNAKDJKSADJKAHDJASHRWEUB FileB (alternate): 1DWK HRSDKKDAHJKLSDLDLLJDGHDFJJE 4ERH DFSKFHADFSBVHFWIHFWJBFS 2RFK DADUHRQWERKBNJAIJDLAJDKAKDNAKDJKSADJKAHDJASHRWEUB


Comment on Re: Matching elements in two arrays and printing the element next to the match.
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://830101]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2014-09-23 04:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (210 votes), past polls