From what I can tell, you are checking everything after the first field matches. The best way to do this is to put the rows of the first file into a hash and then use the rows of the second file to do a lookup to see if the hash entry exists.
The following code is not guaranteed to run (I had a long night last night!) but should show the general idea....
#!/usr/bin/perl -w
use strict;
open (INA, $ARGV[0]) || die "cannot to open gene file";
open (INB, $ARGV[1]) || die "cannot to open coding_annotated.var files
+";
my @sample1 = <INA>;
my @sample2 = <INB>;
# use map for this maybe?
foreach my $line (@sample1) {
my ($id, $rest) = split( '\t', $line, 2);
chomp ($rest);
$hash1{$rest} = $id;
}
foreach my $line (@sample2) {
my ($id, $rest) = split( '\t', $line, 2);
chomp( $rest);
if (exists($hash1{$rest}) {
print "Match: $line\n";
}
}
A Monk aims to give answers to those who have none, and to learn from those who know more.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|