Adding a particular column from one file to a second file

ZWcarp has asked for the wisdom of the Perl Monks concerning the following question:

Ok so lets say I have two files that are tab delimited. The file1 has 20 columns, the first column is the gene. I have a file2 who's first column is also Gene(albeit in a different order), but the second column of file2 is a value that is not in the file1. Is there a faster way to add the file2 second column to the 21st column of file1 then reading in both as arrays and matching the Gene column, then printing out the variables that I have split respective columns into? A command line command would be preferable because the files layout are always changing.. and thus a script must be continuously altered based on splitting... here is an example script

#!/usr/bin/perl -w
use strict;

open(FILE,"$ARGV[0]");
my @input=<FILE>;
close(FILE);
foreach my $line(@input){
    chomp $line;
     my @batch=();
    (my $gene, my $VAA, my $ CCDS ,my $chrposMut, my $o_acc, my $o_pos
+, my $o_aa1, my $o_aa2)= split(/\t/,$line);
    (my $chrpos,my $Mut)=split(/\./,$chrposMut);
    
    my ($chr,$position)=split(/\:/,$chrpos);
    my $vAA=$o_aa1 . $o_pos . $o_aa2;
    $vAA=~s/\s//g;
       
    @batch=`perl -ne 'print if /$position/' Gene_Asign.txt`; # put the
+ text file here that contains the gene name as well as something that
+ can be used to identify the correct row when comparing it with the p
+olyphen outfile
    foreach my $Line(@batch)
    {
        chomp $Line;
       # (my $snp, my $str, my $gene, my $v4, my $v5, my $CCDS)=split(
+/\t/,$Line);
    #print $Line ."\n";
        my($Sample,$v2)= split(/\s/,$Line);
        
   
    #if($AA=~m/$vAA/){
        print $Sample . "\t" . $line . "\n";
     #   print $gene . "\t" . $vAA . "\t" . $CCDS. "\t" . $line . "\n"
+;
       # }
    }
    
}
[download]

Comment on Adding a particular column from one file to a second file Download Code

Replies are listed 'Best First'.
Re: Adding a particular column from one file to a second file by Marshall (Canon) on Jul 14, 2011 at 18:52 UTC
Open the 2nd file and make a hash of the Gene to the column 2. Then read each line from file1, look up the gene's value in the hash, then add that to the end of the line from file1. It is not necessary to know how many columns are in file1 or even file2. Column 1 in file1 matters. Both Column 1 and Column2 in file2 matter. #!/usr/bin/perl -w use strict; my %col2; #gene in column 1 to special column 2 my ($file1, $file2) = @ARGV; if (@ARGV != 2) { print "some usage here\n"; exit(); }; open (my $two_cols, '<', $file2) or die "unable to open $file2 $!"; open (my $big_file, '<', $file1) or die "unable to open $file1 $!"; %col2 = map{ s/\s+$//; #chomp + zap trailing white space my ($gene, $v) = split(/\t/,$_,3); }<$two_cols>; while (<$big_file>) { s/\s+$//; my ($gene, $rest) = split(/\t/,$_,2); if (my $lastcol = $col2{$gene}) { print "$gene\t$rest\t$lastcol\n"; } else { warn "$gene not found in $file2\n"; } } __END__ C:\TEMP>type f1.txt gene1 1234 78975 gene13 9876 ldlgfjk C:\TEMP>type f2.txt gene1 abc gene13 xyz C:\TEMP>update.pl f1.txt f2.txt gene1 1234 78975 abc gene13 9876 ldlgfjk xyz [download]	[reply] [d/l]
Re: Adding a particular column from one file to a second file by Not_a_Number (Prior) on Jul 14, 2011 at 20:40 UTC
use strict; use warnings; my $file1 = "Wilder 1933 Col2 Col3 xyz Pitney 1940 Col2 Col3 xyz Hackman 1930 Col2 Col3 xyz Kelly 1912 Col2 Col3 xyz"; my $file2 = "Hackman Col2 Col3 xyz Kelly Col2 Col3 xyz Simmons Col2 Col3 zyx Wilder Col2 Col3 xyz"; open my $fh1, '<', \$file1; my %gene_dob; while ( my $line = <$fh1> ) { my ( $gene, $dob ) = ( split /\t/, $line )[0, 1]; $gene_dob{ $gene } = $dob; } close $fh1; open my $fh2, '<', \$file2; while ( my $line = <$fh2> ) { chomp $line; my $gene = ( split /\t/, $line )[0]; $line .= "\t$gene_dob{$gene}" if $gene_dob{ $gene }; print "$line\n"; } [download] Note that this code omits to issue any warning if any of your Genes are present in file1 but not in file2, or vice versa. Error checking is left as an exercise... Update: Yeah, well. Marshall beat me to it by a couple of hours for the algorithm. My excuse: I was researching birth dates ;).	[reply] [d/l]
Re: Adding a particular column from one file to a second file by Jim (Curate) on Jul 14, 2011 at 21:30 UTC
Here's a demonstration of the join operation performed using two tools: join, which assumes a common ordering of the keys (i.e., sorted files), and AWK, which does not. `$ cat file1.txt gene1 1234 78975 gene13 9876 ldlgfjk $ cat file2.txt gene1 abc gene13 xyz $ join -t ' ' file1.txt file2.txt gene1 1234 78975 abc gene13 9876 ldlgfjk xyz $ awk 'FNR == NR { a[$1] = $2; next } { print $0, a[$1] }' \ > FS='\t' OFS='\t' file2.txt file1.txt gene1 1234 78975 abc gene13 9876 ldlgfjk xyz $` [download] And here's a ham-handed translation of the AWK idiom into Perl. `$ perl -F'\t' -lane ' > $a{$F[0]} = $F[1], next if $#ARGV == 0; > print "$_\t$a{$F[0]}"; > ' file2.txt file1.txt gene1 1234 78975 abc gene13 9876 ldlgfjk xyz $` [download] I use the AWK idiom a lot. It ain't Perl, but it's darned handy. Jim	[reply] [d/l] [select]
Re: Adding a particular column from one file to a second file by Jim (Curate) on Jul 14, 2011 at 20:47 UTC
You have two tab-delimited data files. They have a common key in their leftmost columns. You need to join the data in the two flat-file database tables (i.e., the two tab-delimited text files). You need an easily-configured command-line utility. Use the join utility. Based on your description of your problem and your needs, there's no good reason to reinvent the join utility in Perl. The command may be as simple as this: `join -t '<tab>' file1.txt file2.txt > file3.txt` Come join us on JoinMonks.org! ;-) Jim	[reply] [d/l]
Re^2: Adding a particular column from one file to a second file by ZWcarp (Beadle) on Jul 15, 2011 at 17:09 UTC
Excellent!! what a useful thing to know. Ok this is probably a dumb question but why is it telling me `join -t '\t' -a1 -1 16 -2 2 PTCL_All.txt Cosmic_Counts_By_Gene.txt join: illegal tab character specification <\code> or <code>join -t '<tab>' -a1 -1 16 -2 2 PTCL_All.txt Cosmic_Counts_By_ +Gene.txt join: illegal tab character specification` [download] Isn't -t ' CHAR' the correct format for specifying the delimiter in the output? Also if I use `join -t ',' -a1 -1 16 -2 2 PTCL_All.txt Cosmic_Counts_By_Gene.txt` The output seems to be much more different than just a comma delimeter being added to the command `join -a1 -1 16 -2 2 PTCL_All.txt Cosmic_Counts_By_Gene.txt` Am I doing this correctly? the latter just looks like what I want with spaces separating the columns( the same number of lines as file 1), while the former seems to create a million repeats of each line (many many many more lines than what is in either file).	[reply] [d/l] [select]
Re^3: Adding a particular column from one file to a second file by ZWcarp (Beadle) on Jul 15, 2011 at 17:13 UTC
I think this works though.... `join -a1 -1 16 -2 2 PTCL_All.txt Cosmic_Counts_By_Gene.txt \|perl -pe 's/ /\t/g'`	[reply] [d/l]
Re: Adding a particular column from one file to a second file by Jim (Curate) on Jul 14, 2011 at 20:45 UTC
Please ignore this accidental post.	[reply]


Don't ask to ask, just ask
	PerlMonks