Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

merging content based on common columns of 2 files using grep

by heidi (Sexton)
on May 06, 2009 at 11:01 UTC ( [id://762219]=perlquestion: print w/replies, xml ) Need Help??

heidi has asked for the wisdom of the Perl Monks concerning the following question:

hi all, I have 2 files, and i want to merge based on the column which is common to both the files. here are my input files:
file1: TST_01 sp|123|fts TST_02 sp|3438|rvs TST_03 sp|2744|rtp file2: file1: sp|123|fts checked_proved sp|3438|rvs proven_right sp|2744|rtp un_proved desired result file: TST_01 sp|123|fts checked_proved TST_02 sp|3438|rvs proven_right TST_03 sp|2744|rtp un_proved
#!/usr/bin/perl open(FH1,file1.txt); open(FH2,file2.txt); @array=<FH2>; while($var=<FH1>){ ($first,$second)=split("\t",$var); @grepped= grep ("$second",@array); print "$first\t@grepped\n"; }
can any one help? thank you.

Replies are listed 'Best First'.
Re: merging content based on common columns of 2 files using grep
by citromatik (Curate) on May 06, 2009 at 11:49 UTC

    First of all, you should use strict; and use warnings; if you want to learn Perl the easy way

    There are some errors in your code, some of them:

    • file1.txt and file2.txt are literal strings, so you should put them in quotation marks
    • After read a file, you should chomp the line/s read
    • The syntax you are using for grep is not correct, you should be using grep BLOCK LIST

    Here is a working version of your program:

    use strict; use warnings; open(FH1,"file1.txt"); open(FH2,"file2.txt"); my @array=<FH2>; chomp @array; while(my $var=<FH1>){ chomp $var; my ($first,$second)=split(/\s+/,$var); my @grepped= grep {/\Q$second\E/} @array; print "$first\t$grepped[0]\n"; }

    You should also consider using hashes to solve your problem

    Another alternative, if the script is only going to join both files and output the result (and you are in a Unix/Linux system, as it seems to be the case) is to use bash's join tool:

    $ join -1 2 -2 1 file1.txt file2.txt

    citromatik

      thank u :)
Re: merging content based on common columns of 2 files using grep
by targetsmart (Curate) on May 06, 2009 at 11:07 UTC
    what is the output you got?, what is the problem you faced?.
    UPDATE
    from the code you shown above, it won't work as you expected. because for every line(first column) in the file1 you are printing the only the second column match from file2.
    grep will give only the matched indexes in this case
    use hash instead
    iterate through the second file and load the splitted line into a hash like the first column as key and second column as value(provided your second file is not in huge size)
    then when you are iterating the first file, just check exists in the hash, if exists append the hash value as the third column during print
    hope this helps

    Vivek
    -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.
      The output was giving only the second column of the first file as results :(
        Copy+Paste is your friend, friend :)
Re: merging content based on common columns of 2 files using grep
by planetscape (Chancellor) on May 07, 2009 at 03:29 UTC
Re: merging content based on common columns of 2 files using grep
by bichonfrise74 (Vicar) on May 06, 2009 at 21:43 UTC
    Try this...
    #!/usr/bin/perl use strict; my %hash; my $raw_file_1 = <<FILE_1; file1: TST_01 sp|123|fts TST_02 sp|3438|rvs TST_03 sp|2744|rtp TST_04 sp|0123|not_here FILE_1 my $raw_file_2 = <<FILE_2; file2: sp|123|fts checked_proved sp|3438|rvs proven_right sp|2744|rtp un_proved sp|0000|bogus bogus FILE_2 open( my $file_1, "<", \$raw_file_1 ) or die "Couldn't open \$raw_file +_1\n"; while (<$file_1>) { next if /^file/; my ($val, $key) = split; $hash{$key} = $val; } open( my $file_2, "<", \$raw_file_2 ) or die "Couldn't open \$raw_file +_2\n"; while (<$file_2>) { next if /^file/; my ($key, $val) = split; print "$hash{$key}\t$key\t$val\n" if ( defined($hash{$key}) ); }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://762219]
Approved by targetsmart
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-23 13:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found