http://www.perlmonks.org?node_id=1075119

hellohello1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a tab delimited file where I want to split each column into array so that I can calculate the individual values of the column. I'm new to perl so I hope for some guidance.
In my text file : ID dataR1 dataR2 dataR3 ... 1 324 445 654 2 234 654 768.5 3 542.12 764 98.2 . . .
Right now, my code only can print out one value (dataR1) in OUT1. How do I actually put the whole values of dataR1 into the array to print out in OUT1? In addition, how do I use for loop to actually loop from dataR1 to dataR3 to put into different arrays instead of having to type out manually as different files has different number of dataRx.
for(my $i = 0; $i < $originalfilecount; $i++) { #read in the current file open CURINFILE, "<$files[$i]" or die "Error couldn't open file $fi +les[$i]\n"; print "$files[$i]\n"; while(<CURINFILE>) { chomp $_; my @columns = split('\t'); push(@ID, $columns[0]); push(@data1, $columns[1]); push(@data2, $columns[2]); push(@data3, $columns[3]); print "\nWriting output..."; mkdir "$pathname" or die "Error couldn't create new Directory"; open OUT1, ">$pathname/column.txt" or die "error couldn't open output +file"; print OUT1 "$data[1]" } close OUT1 }
Help appreciated :)

Replies are listed 'Best First'.
Re: How to add column into array from delimited tab file
by NetWallah (Canon) on Feb 17, 2014 at 04:05 UTC
    You have the right idea, in collecting data into your (hopefully declared earlier) arrays : @data1, @data2 etc.

    The problem area is when you open and write to the output file WITHIN the read loop for each row.

    You need to move the output file open and Write OUTSIDE the file read loop, something like this:

    # Outside the read loop, after closing input file.... mkdir "$pathname" or die "Error couldn't create new Directory"; open my $OUT1, ">", "$pathname/column.txt" or die "error couldn't open + output file"; print $OUT1 "$_\n" for @data1; close $OUT1;
    (Made some adjustments to localize the file handle, and use the "3 argument" open - see other writeups to understand why this is a good idea.

            What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?
                  -Larry Wall, 1992

Re: How to add column into array from delimited tab file
by kcott (Archbishop) on Feb 17, 2014 at 04:13 UTC

    G'day hellohello1,

    You don't show any code populating the array @data. The code

    print OUT1 "$data[1]"

    is printing the second element of the array @data which, from your description, contains the value dataR1.

    I expect what you want is

    print OUT1 "@data1"

    -- Ken

Re: How to add column into array from delimited tab file
by Kenosis (Priest) on Feb 17, 2014 at 19:02 UTC

    The following expresses my understanding of your situation, based upon your original posting:

    • You have multiple, tab-delimited files
    • The first line of each file contains column headers
    • Each file may have a different number of columns
    • The first column of each file is the ID, so can be discarded
    • You want to generate a file for each column (beyond the ID column), for each of the tab-delimited files

    If my understanding is correct, the following--which uses a hash of arrays (HoA)--provides one solution:

    use strict; use warnings; my ( @header, %hash ); my @files = qw/File1.txt File2.txt/; local $, = "\n"; for my $file (@files) { open my $fhIN, '<', $file or die $!; while ( my $line = <$fhIN> ) { my @columns = split ' ', $line; if ( $. == 1 ) { @header = @columns; } else { push @{ $hash{ $header[$_] } }, $columns[$_] for 1 .. $#co +lumns; } } close $fhIN; for my $i ( 1 .. $#header ) { open my $fhOUT, '>', "$file\_$header[$i].txt" or die $!; print $fhOUT @{ $hash{ $header[$i] } }; close $fhOUT; } undef %hash; }
      Hello ken, thanks for explaining in your reply. It makes abit more sense for me now!
      Yes to the following: •You have multiple, tab-delimited files •The first line of each file contains column headers •Each file may have a different number of columns
      However, I do want to keep the first column. I have columns that contain dataR(X) (e.g. dataR1, dataR2...dataR28) and then followed by several links (contained in several columns..some rows will be empty.) which I also want to keep So right now, my problem here is trying to find the header that match dataS0XRx so that I can grab those columns to perform some calculations:
      e.g. first file.txt: ID dataS01R1 dataS01R2 dataS02R1 dataS02R2 Links M45 345.2 536 876.12 873 http://.. M34 836 893 829 83.234 M72 873 123 342.36 837 M98 452 934 1237 938 http://.. =================================================== Calculation: row2/row2, row3/row2, row4/row2...row3400/row2 row2/row3, row3/row3, row4/row3 ... row3400/row3 row2/row4, row3/row4 ...row3400/row4 E.g dataS01R1 become: ID dataS01R1 ..dataS01R02... Links M45 1 (345.2/345.2) http://.. M34 2.42 (836/345.2) M72 2.52 (873/345.2) M98 1.309 (452/345.2) http://.. M45 0.41 (345.2/836) http://.. M34 1 (836/836) M72 1.04 (873/836) M98 0.54 (452/836) http://.. . . (loop through rows as denominator) .
      and then loop through the column, print it out and filter off unwanted rows based on the average Coefficient Variance across all dataSXR0X rows (which I will figure out later after I manage to figure out the beginning part). So my problem here: How to find the column headers matching dataS0XR0X to put those columns into arrays for manipulation? here is my code which I have done initially before posting into perlmonk:
      if($first) { #if this is the first file, find the column locations my $firstline = <CURINFILE>; #read in the header line chomp $firstline; my @columns = split(/\t/, $firstline); my $columncount = 0; while($columncount <= $#columns && !($columns[$columncount] =~ + /ID/)) { $columncount++; } $ID= $columncount; while($columncount <= $#columns && !(($columns[$columncoun +t] =~ /_dataS(\d+)R/) )) { $columncount++; } $intensitydata = $columncount; #read in the remainder of the file while(<CURINFILE>) { #add the id, intensity values to an array chomp $_; my @templine = split(/\t/,$_); my @tempratio = (); push(@tempratio, $templine[$ID]); push(@tempratio, $templine[$intensitydata]); print "\nWriting output...";
      I tried this code initially (before changing to the code I posted in first post)but it doesn't print out anything so I do not know what's went wrong. I am working on large databases and initially I worked with excel but it is too slow and lag my whole computer when performing calculations, so I decided to try PERL instead as I read that it is good for manipulating large datasets. However I am quite new to PERL, just started two months back. So I am not sure if what I am doing is okay. If there are other suggestions, let me know too. I hope my explanation is not confusing. :)

        Have made a few modifications:

        use strict; use warnings; my ( @header, %hash ); my @files = qw/File1.txt File2.txt/; local $, = "\t"; for my $file (@files) { open my $fhIN, '<', $file or die $!; while ( my $line = <$fhIN> ) { my @columns = split ' ', $line; if ( $. == 1 ) { @header = @columns; } else { push @{ $hash{ $header[$_] } }, $columns[$_] for 0 .. $#co +lumns; } } close $fhIN; for my $key ( keys %hash ) { if ( $key =~ /^dataS\d\dR\d$/ ) { print $key, @{ $hash{$key} }, "\n"; } } undef %hash; }

        All columns are kept. After the script has processed a file's lines, it iterates through the hash keys. Note that a regex attempts to match the heading pattern for the columns you're interested in processing. Now, when there a match, it just prints the key and the associated list of values.

Re: How to add column into array from delimited tab file
by hellohello1 (Sexton) on Feb 17, 2014 at 06:23 UTC
    Hi, I have tried putting print $OUT1 "$_\n" for @data1; as what netwallah suggested. It is able to print the output I want. But in the command line, it shows:
    Processing data... Original.txt Writing output...Use of uninitialized value $metabolite[44] in string +at G:\Metabolomics\Programming\PERL\Ratio Test\ratio test (update +d 12 feb 1 4).pl line 101, <CURINFILE> line 1 (#1) (W uninitialized) An undefined value was used as if it were alread +y defined. It was interpreted as a "" or a 0, but maybe it was a mi +stake. To suppress this warning assign a defined value to your variables. To help you figure out what was undefined, perl will try to tell y +ou the name of the variable (if any) that was undefined. In some cases it + cannot do this, so it also tells you what operation you used the undefine +d value in. Note, however, that perl optimizes your program and the opera +tion displayed in the warning may not necessarily appear literally in y +our program. For example, "that $foo" is usually optimized into "that + " . $foo, and the warning will refer to the concatenation (.) operat +or, even though there is no . in your program. Writing output...Uncaught exception from user code: Error couldn't create new Directory at G:\Metabolomics\Program +ming\PERL\ Ratio Test\ratio test (updated 12 feb 14).pl line 98, <CURINFILE> line + 2. at G:\Metabolomics\Programming\PERL\Ratio Test\ratio test (updated 12 + feb 14).p l line 98 Press any key to continue . . .
    is it some kind of error? In addition, how do I actually loop from dataR1 to dataR3 to put into different arrays instead of having to type out manually as different files has different number of dataRx. I know it has something to do with this line: =~ /_data(\d+)R/ somewhere in the code but I have no idea how.
    push(@data1, $columns[1]); push(@data2, $columns[2]); push(@data3, $columns[3]);
    into something like this after finding the /_dataS(\d+)R/ in the column headers:
    push (@data[j], $columns[j]);
    I appreciate if there is any link related to that that can push me to the right direction. :) Thanks for the help by the way!
      The structure you seem to be looking for is a 2-dimensional array - which, in perl, is an array of arrays:
      my @aoa; # Array of arrays (2-d arrray) open my $CURINFILE, "<", $files[$i]" or die "Error couldn't open file +$files[$i]\n"; print "$files[$i]\n"; while(<$CURINFILE>) { chomp $_; push @aoa, [ split('\t')]; # Insert an array ref into the array +(which is what makes it 2-D) } close $CURINFILE; print "\nWriting output..."; #The first row of @aoa contains the titles, so skip that, and print th +e rest.... for my $row (@aoa[1..$#aoa]){ # That is a slice of the array, from in +dex 1 till the end print $row->[0]."\n"; # $row->[0] contains the contents of the fir +st column (ID) # Similarly, $row->[1] is the dataR1 column }

              What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?
                    -Larry Wall, 1992

        Ok. when I put the $row into arrays, it doesn't print out anything?
        for my $row (@aoa[1..$#aoa]) { @ID = $row->[0]; @data = $row->[1..$#aoa]; } print $OUT1 "$_\n" for @ID; print $OUT2 "$_\n" for @data;
        the reason for array is so that I can calculate each value over the respective value in same array in a loop.