Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Matching columns between files

by gggg (Initiate)
on Jun 25, 2011 at 20:38 UTC ( [id://911404]=perlquestion: print w/replies, xml ) Need Help??

gggg has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am having two input files where I have to match columns 1,2,3 of one infile with columns 2,3,4 of second and i have to print for the matched lines the values in column 4 of second infile. I have made two separate arrays for each infile . Here is the script below :

use strict; use warnings; my $infile1 = $ARGV[0]; my $infile2 = $ARGV[1]; my $outfile = $ARGV[2]; unless (open( INFILE1, $infile1)) { die "Cannot open $infile1\n"; } my @array; my @slice; my @array1; my @slice1; my $element; while(<INFILE1>) { chomp; my @array = split ('\t', $_); my @slice = @array[0,1,2]; } open (INFILE2, "<", $infile2) || die "cannot open $infile2"; open (OFILE, ">", $outfile) || die "cannot open $outfile"; my $i = 0; while (<INFILE2>) { chomp; my @array1 = split ('\t', $_); my @slice1 = @array1[1,2,3,4]; $slice1[0] = s/chr//g; for ($i=0; $i < $#slice1; $i++) { foreach my $element (@slice) { print OFILE $slice1[3] ."\n"; } else { print OFILE "NA\n"; } } } }

Any help would be appreciated. Thanks

Replies are listed 'Best First'.
Re: Matching columns between files
by toolic (Bishop) on Jun 25, 2011 at 20:50 UTC
    Welcome to the Monastery.

    See: How do I compose an effective node title?.

    The code you posted has compile errors. Did you mean to have an "if" to match your "else"?

    You have a scoping problem. Those are two different @slice arrays.

    my @slice; ... while(<INFILE1>) { chomp; my @array = split ('\t', $_); my @slice = @array[0,1,2]; }

    You probably want to get rid of my inside your while loop and maybe you need to push. My guess is that your foreach loop never loops through your @slice array since it is empty.

      Hey Thanks for the information. I am trying here to match the three columns 0,1,2 which I sliced from the first array with the three columns 1, 2,3 of another infile and print the column 4 of second infile for the corresponding matches. I am not sure what is wrong in the code relevant to this Thanks
Re: Matching columns between files
by graff (Chancellor) on Jun 26, 2011 at 03:43 UTC
    Here's a script I posted a while ago, which generally does the sort of thing you're trying to do: cmpcol.

    And here's what the usage might look like to have that script output the combined contents of lines from two files, where columns 1,2,3 of fileA match columns 2,3,4 of fileB (first column is #1):

    cmpcol -i -lb tab fileA:1,2,3 fileB:2,3,4
    Now, as a first try, the output will be more than you actually want: it prints the full content of both matching lines. (And if key columns are not unique within one of the files -- e.g. fileA has multiple lines with the same combination of values in columns 1,2,3 and these match a row in fileB -- you'll get multiple lines from that file that match a single line from the other file.)

    But you can easily filter the output to trim out the unwanted columns. And maybe the "cmpcol" script itself will give you some ideas for how to write a script that does exactly what you want.

Re: Matching columns between files
by bluescreen (Friar) on Jun 25, 2011 at 23:04 UTC

    Have you consider doing a line-by-line comparison, if you have large files you might end up consuming too much memory for storing an array with the contents. Here's an example

    use strict; use warnings; open my $fh1, '<', 'file1'; open my $fh2, '<', 'file2'; while ( defined( my $line1 = <$fh1> ) and defined( my $line2 = <$fh2> +) ) { chomp($line1); chomp($line2); my @values1 = split( ',', $line1 ); my @values2 = split( ',', $line2 ); print "$values2[4]\n" if ( $values1[0] eq $values2[1] and $values1[1] eq $values2[2] and $values1[2] eq $values2[3] ); }
      Thanks for the reply. But this one is working but giving an empty output file.
        Thanks for the reply. But this one is working but giving an empty output file.

        What output file, it prints to STDOUT

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://911404]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-03-28 22:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found