Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Adding a particular column from one file to a second file

by ZWcarp (Beadle)
on Jul 14, 2011 at 17:27 UTC ( [id://914399]=perlquestion: print w/replies, xml ) Need Help??

ZWcarp has asked for the wisdom of the Perl Monks concerning the following question:

Ok so lets say I have two files that are tab delimited. The file1 has 20 columns, the first column is the gene. I have a file2 who's first column is also Gene(albeit in a different order), but the second column of file2 is a value that is not in the file1. Is there a faster way to add the file2 second column to the 21st column of file1 then reading in both as arrays and matching the Gene column, then printing out the variables that I have split respective columns into? A command line command would be preferable because the files layout are always changing.. and thus a script must be continuously altered based on splitting... here is an example script
#!/usr/bin/perl -w use strict; open(FILE,"$ARGV[0]"); my @input=<FILE>; close(FILE); foreach my $line(@input){ chomp $line; my @batch=(); (my $gene, my $VAA, my $ CCDS ,my $chrposMut, my $o_acc, my $o_pos +, my $o_aa1, my $o_aa2)= split(/\t/,$line); (my $chrpos,my $Mut)=split(/\./,$chrposMut); my ($chr,$position)=split(/\:/,$chrpos); my $vAA=$o_aa1 . $o_pos . $o_aa2; $vAA=~s/\s//g; @batch=`perl -ne 'print if /$position/' Gene_Asign.txt`; # put the + text file here that contains the gene name as well as something that + can be used to identify the correct row when comparing it with the p +olyphen outfile foreach my $Line(@batch) { chomp $Line; # (my $snp, my $str, my $gene, my $v4, my $v5, my $CCDS)=split( +/\t/,$Line); #print $Line ."\n"; my($Sample,$v2)= split(/\s/,$Line); #if($AA=~m/$vAA/){ print $Sample . "\t" . $line . "\n"; # print $gene . "\t" . $vAA . "\t" . $CCDS. "\t" . $line . "\n" +; # } } }

Replies are listed 'Best First'.
Re: Adding a particular column from one file to a second file
by Marshall (Canon) on Jul 14, 2011 at 18:52 UTC
    Open the 2nd file and make a hash of the Gene to the column 2. Then read each line from file1, look up the gene's value in the hash, then add that to the end of the line from file1. It is not necessary to know how many columns are in file1 or even file2. Column 1 in file1 matters. Both Column 1 and Column2 in file2 matter.
    #!/usr/bin/perl -w use strict; my %col2; #gene in column 1 to special column 2 my ($file1, $file2) = @ARGV; if (@ARGV != 2) { print "some usage here\n"; exit(); }; open (my $two_cols, '<', $file2) or die "unable to open $file2 $!"; open (my $big_file, '<', $file1) or die "unable to open $file1 $!"; %col2 = map{ s/\s+$//; #chomp + zap trailing white space my ($gene, $v) = split(/\t/,$_,3); }<$two_cols>; while (<$big_file>) { s/\s+$//; my ($gene, $rest) = split(/\t/,$_,2); if (my $lastcol = $col2{$gene}) { print "$gene\t$rest\t$lastcol\n"; } else { warn "$gene not found in $file2\n"; } } __END__ C:\TEMP>type f1.txt gene1 1234 78975 gene13 9876 ldlgfjk C:\TEMP>type f2.txt gene1 abc gene13 xyz C:\TEMP>update.pl f1.txt f2.txt gene1 1234 78975 abc gene13 9876 ldlgfjk xyz
Re: Adding a particular column from one file to a second file
by Not_a_Number (Prior) on Jul 14, 2011 at 20:40 UTC
    use strict; use warnings; my $file1 = "Wilder 1933 Col2 Col3 xyz Pitney 1940 Col2 Col3 xyz Hackman 1930 Col2 Col3 xyz Kelly 1912 Col2 Col3 xyz"; my $file2 = "Hackman Col2 Col3 xyz Kelly Col2 Col3 xyz Simmons Col2 Col3 zyx Wilder Col2 Col3 xyz"; open my $fh1, '<', \$file1; my %gene_dob; while ( my $line = <$fh1> ) { my ( $gene, $dob ) = ( split /\t/, $line )[0, 1]; $gene_dob{ $gene } = $dob; } close $fh1; open my $fh2, '<', \$file2; while ( my $line = <$fh2> ) { chomp $line; my $gene = ( split /\t/, $line )[0]; $line .= "\t$gene_dob{$gene}" if $gene_dob{ $gene }; print "$line\n"; }

    Note that this code omits to issue any warning if any of your Genes are present in file1 but not in file2, or vice versa. Error checking is left as an exercise...

    Update: Yeah, well. Marshall beat me to it by a couple of hours for the algorithm. My excuse: I was researching birth dates ;).

Re: Adding a particular column from one file to a second file
by Jim (Curate) on Jul 14, 2011 at 21:30 UTC

    Here's a demonstration of the join operation performed using two tools: join, which assumes a common ordering of the keys (i.e., sorted files), and AWK, which does not.

    $ cat file1.txt gene1 1234 78975 gene13 9876 ldlgfjk $ cat file2.txt gene1 abc gene13 xyz $ join -t ' ' file1.txt file2.txt gene1 1234 78975 abc gene13 9876 ldlgfjk xyz $ awk 'FNR == NR { a[$1] = $2; next } { print $0, a[$1] }' \ > FS='\t' OFS='\t' file2.txt file1.txt gene1 1234 78975 abc gene13 9876 ldlgfjk xyz $

    And here's a ham-handed translation of the AWK idiom into Perl.

    $ perl -F'\t' -lane ' > $a{$F[0]} = $F[1], next if $#ARGV == 0; > print "$_\t$a{$F[0]}"; > ' file2.txt file1.txt gene1 1234 78975 abc gene13 9876 ldlgfjk xyz $

    I use the AWK idiom a lot. It ain't Perl, but it's darned handy.

    Jim

Re: Adding a particular column from one file to a second file
by Jim (Curate) on Jul 14, 2011 at 20:47 UTC

    You have two tab-delimited data files. They have a common key in their leftmost columns. You need to join the data in the two flat-file database tables (i.e., the two tab-delimited text files). You need an easily-configured command-line utility.

    Use the join utility. Based on your description of your problem and your needs, there's no good reason to reinvent the join utility in Perl.

    The command may be as simple as this:

        join -t '<tab>' file1.txt file2.txt > file3.txt

    Come join us on JoinMonks.org! ;-)

    Jim

      Excellent!! what a useful thing to know. Ok this is probably a dumb question but why is it telling me
      join -t '\t' -a1 -1 16 -2 2 PTCL_All.txt Cosmic_Counts_By_Gene.txt join: illegal tab character specification <\code> or <code>join -t '<tab>' -a1 -1 16 -2 2 PTCL_All.txt Cosmic_Counts_By_ +Gene.txt join: illegal tab character specification
      Isn't -t ' CHAR' the correct format for specifying the delimiter in the output? Also if I use join  -t ',' -a1 -1 16 -2 2   PTCL_All.txt Cosmic_Counts_By_Gene.txt The output seems to be much more different than just a comma delimeter being added to the command  join   -a1 -1 16 -2 2   PTCL_All.txt Cosmic_Counts_By_Gene.txt Am I doing this correctly? the latter just looks like what I want with spaces separating the columns( the same number of lines as file 1), while the former seems to create a million repeats of each line (many many many more lines than what is in either file).
        I think this works though.... join   -a1 -1 16 -2 2   PTCL_All.txt Cosmic_Counts_By_Gene.txt |perl -pe 's/ /\t/g'
Re: Adding a particular column from one file to a second file
by Jim (Curate) on Jul 14, 2011 at 20:45 UTC

    Please ignore this accidental post.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://914399]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (9)
As of 2024-04-23 13:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found