It looks like both files use column 5 as a "key" of sorts to connect the two files. I would approach this by reading all of the first file (the one you open as MYFILE), collecting the values from the last column along the way. Since you only need to collect one value from each line, I save then in an array as I read the file. This will work fine even for fairly large files. When the first file is processed, read from the second file (the one you open as NEWF) and do the substitutions (line by line), writing the output as we go.
#!/usr/bin/env perl
use strict;
use warnings;
my $file1 = "pm-890461-in1.txt";
my $file2 = "pm-890461-in2.txt";
open( MYFILE, '<', $file1 ) or die "cannot open $file1: $!";
open( NEWF, '<', $file2 ) or die "cannot open $file2: $!";
my @in_values;
while ( <MYFILE> ) {
chomp;
my( $index, $value ) = ( split /\s+/ )[4, -1];
# above line does same thing as next three
# my @fields = ( split /\s+/ );
# my $index = $fields[4];
# my $value = $fields[-1];
$in_values[ $index ] = $value;
}
close MYFILE;
while ( <NEWF> ) {
chomp;
my @fields = ( split /\s+/ );
my $index = $fields[4];
$fields[-1] = $in_values[ $index ];
my $output = join "\t", @fields;
print "$output\n";
}
close NEWF;
Note that I use
split (not
substr) to get the fields of interest from each line (same approach for both files). For the output, I
join the fields with a tab character. You should change that to something else (e.g., a fixed number of space characters) if you need the output formatted differently. And of course this writes to STDOUT, so you will need to redirect the output on the command line or add to this code to
open an output file and
print to that.
When you are more comfortable with Perl, you will find that some of this is actually on the "verbose" side. Using Perl idioms would make some of my code more compact, but also a bit harder to follow until you have more experience.