Anto_ch has asked for the wisdom of the Perl Monks concerning the following question:

Hi all I am quite new to perl, I am writing a script to extract multiple columns from a tab delimited text files that is the output of a previous script. The out put files look like this:

column1 column2 column3 .... column n

111111 222222 333333 ..... nnnnnnn

121212 212121 313131 ...... nnnnnnn

131313 232323 343434 ...... nnnnnnn

I would like to extract the second and the third column and put it on a new file, but with the code I wrote it enters the file but read and catch only the first number in column2. I don't know why it wan't iterate through the lines. Need help i am going crazy! :-) So far I had the same problem with another file where I have to subtract the values in column1 from columns 2 and it worked but when i tryied to use the same code to extract instead of subtracting catch only the first element of such column. thank you in advance for your help. code i am using is below:

use strict; use warnings; open(APRI2, "path to file.txt"); my@out_runseq = <APRI2>; close APRI2; print @out_runseq, "\n"; foreach my$lines (@out_runseq) { my@splitlines = split("\t", $lines); my(@motif_position) = ( ); if (my$splitlines = $splitlines[7]) { my$position1 = $splitlines[7]; print ("$position1\t"); push(@motif_position, $position1); } } exit;

Replies are listed 'Best First'.
Re: multicolumn extraction
by davido (Cardinal) on Jun 03, 2012 at 15:53 UTC

    Extract the 2nd and 3rd column, and write them to a new file:

    use strict; use warnings; open my $APRI2_in, '<', 'path_to_infile.txt' or die $!; open my $out_fh, '>', 'path_to_outfile.txt' or die $!; while( <$APRI2_in> ) { my @columns = split /\t/, $_; print $out_fh "$columns[1]\t$columns[2]\n"; } close $APRI2_in; close $out_fh or die $!;

    ...or as a one-liner...

    perl -plaF/\t/ -e '$_ = "$F[1]\t$F[2]"' infile > outfile


      For the one-liner, the pattern for the -F flag needs to be escaped, since the shell will gobble the first "\":
      perl -apF\\t -e '$_="$F[1]\t$F[3]\n"'

                   I hope life isn't a big joke, because I don't get it.

      Nicely done, Dave. However, since we don't know the number of columns or the file size, do you think it would be better to limit the split to only the needed columns, as in the following?

      my @columns = (split /\t/)[1 .. 2];

      Depending on these factors and the machine, the script might otherwise choke.

      Just a thought...


      Now splitting on /\t/ based upon sauoq's good catch in his comment below.

        Depending on these factors and the machine, the script might otherwise choke.

        That's highly unlikely as the file is being handled line by line. And if there were a truly humongous line, your modification actually wouldn't be much better.

        And you've introduced a potential bug by splitting a tab delimited file on whitespace instead of on tabs.

        "My two cents aren't worth a dime.";