<?xml version="1.0" encoding="windows-1252"?>
<node id="1004981" title="Re: Regular expression help" created="2012-11-21 13:15:50" updated="2012-11-21 13:15:50">
<type id="11">
note</type>
<author id="880879">
space_monk</author>
<data>
<field name="doctext">
&lt;p&gt;From what I can tell, you are checking everything after the first field matches. The best way to do this is to put the rows of the first file into a hash and then use the rows of the second file to do a lookup to see if the hash entry exists.&lt;/p&gt;


&lt;p&gt;The following code is not guaranteed to run (I had a long night last night!) but should show the general idea....&lt;/p&gt;

&lt;code&gt;
#!/usr/bin/perl -w
use strict;

open (INA, $ARGV[0]) || die "cannot to open gene file";
open (INB, $ARGV[1]) || die "cannot to open coding_annotated.var files
+";

my @sample1 = &lt;INA&gt;;
my @sample2 = &lt;INB&gt;;

# use map for this maybe?
foreach my $line (@sample1) {
    my ($id, $rest) = split( '\t', $line, 2);
    chomp ($rest);
    $hash1{$rest} = $id;
}

foreach my $line (@sample2) {
    my ($id, $rest) = split( '\t', $line, 2);
    chomp( $rest);
    if (exists($hash1{$rest}) {
       print "Match: $line\n";
    }
}

&lt;/code&gt;

&lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-880879"&gt;
A Monk aims to give answers to those who have none, and to learn from those who know more.
&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
1004901</field>
<field name="parent_node">
1004901</field>
</data>
</node>
