Re^3: How to add column into array from delimited tab file

for my $key ( keys %hash ) {
        if ( $key =~ /^dataS\d\dR\d$/ ) {
            print $key, @{ $hash{$key} }, "\n";
        }
[download]

my $first = 1;
for(my $i = 0; $i < $originalfilecount; $i++)
{
    #read in the current file
    open CURINFILE, "<$files[$i]" or die "Error couldn't open file $fi
+les[$i]\n";
    print "$files[$i]\n";
    if($first)
    {
        #if this is first file, find column locations
        my $firstline = <CURINFILE>;   #read headerline
        chomp $firstline;
        my @columns = split (/\t/, $firstline);
        my $columncount = 0;
        
        # print "$firstline\n"; #check if print headers correctly

####### Column Headers for ID, TIME #########     
       
        while ($columncount <= $#columns && !($columns[$columncount] =
+~ /ID/))
        {
                $columncount ++;
        }
        $ID = $columncount;
        while ($columncount <= $#columns && !($columns[$columncount] =
+~ /Time/)) 
        {
                $columncount ++;
        }
        $masstimes = $columncount;
              while ($columncount <= $#columns && !($columns[$columnco
+unt] =~ /Links/)) 
        {
                $columncount ++;           
        }
        $Links = $columncount;

#check if column position is correct (so far it is correct)      
print "ID is at column: $ID\n"; #output = 0
print "Time is at column: $masstimes\n"; #output = 1
print "Links is at column: $Links\n"; #output = 33
       
#DataR Columns (got ERROR here where I can't run script at all if I ad
+d this) #

        while($columncount <= $#columns && !(($columns[$columncount] =
+~ /_data/)))
        {
            $columncount++;
        }
        
        $columns[$columncount] =~ /_dataS(\d+)R/;
          my $currentReplicateID = $1;
    my $currentReplicateCount = 1;
        $ctrlStartCol = $columncount++;
        while($columncount <= $#columns)
        {
            $columns[$columncount] =~ /_dataS(\d+)R/;
            my $newReplicateID = $1;
            if($newReplicateID ne $currentReplicateID)
            {
                push(@replicateCount, $currentReplicateCount);
                 $currentReplicateID = $newReplicateID;
                 $currentReplicateCount = 1;
            }
            else
            {
                $currentReplicateCount++;
            }
            $columncount++;
        }
        #add the last replicate in
        push(@replicateCount, $currentReplicateCount);
              
        
 ###### End of Data Column Headers ####     
   


####### Read remainder of the file ##############
        while (<CURINFILE>)
        {
            #add metabolite ID, MZ, RT to an array
            chomp $_;
            my @templine = split (/\t/, $_);
            push(@tempratio, $templine[$metabolite]);
            push(@tempratio, $templine[$masstimes]);
            push(@tempratio, $templine[$rt]);
            
            #ERROR
            #add intensities from the samples
             my $columnIndex = $ctrlStartCol;
                for(my $k = 0; $k <= $i; $k++)
        {
                    $columnIndex += $replicateCount[$k];
        }
                    for(my $j = 0; $j < $replicateCount[$i+1]; $j++)
                    {
                        push(@tempratio, $templine[$columnIndex+$j]);
                    }       
         
        }

} # end of main if loop 



close CURINFILE;
} #end of main for loop 

    
       
  ############## Start of output ##################      
                                         
                     
print "\nWriting output...";

#create a new Directory and open output files and print out those valu
+es from the hash that meet the filtering criteria
#filtering criteria: defaults set at pvalue < 0.05, 0.5 <ratio > 1.5. 
+User specified
mkdir "$pathname" or die "Error couldn't create new Directory";
open my $OUT1, ">$pathname/Metabolite ID.txt" or die "error couldn't o
+pen output file";
open my $OUT2, ">$pathname/masstimes.txt" or die "error couldn't open 
+output file";
open my $OUT3, ">$pathname/retentiontimes.txt" or die "error couldn't 
+open output file";
open my $OUT4, ">$pathname/intensitydata.txt" or die "error couldn't o
+pen output file";

print $OUT1 "$tempratio[0]";  print $OUT2 "$tempratio[1]";
print $OUT3 "$tempratio[2]";
print $OUT4 "$tempratio[3]";


close $OUT1;
close $OUT2;
close $OUT3;
[download]

8899_Neg_Rep01_dataS01R01
8889_Neg_Rep02_dataS01R02
8889_Neg_Rep03_dataS01R03
7499_Neg_Rep01_dataS02R01
7499_Neg_Rep02_dataS02R02
7499_Neg_Rep03_dataS02R03
7709_Neg_Rep01_dataS05R01
7709_Neg_Rep02_dataS05R02
7709_Neg_Rep03_dataS05R03

(and so on...)
[download]

[reply]
[d/l]
[select]

I want to be able manipulate the columns individually (thus attempting array) so I am not sure how hash can help in this case.

In the case of using a hash of arrays (HoA), you can think of the arrays as named--although Perl has no such construct.

For example, running the modified script that only prints the headers and values, the following, single file was used:

ID  dataS01R1  dataS01R2  dataS02R1 
1      324       445        654 
2      234       654        768.5
3      542.12    764        98.2
[download]

And here the script's output:

dataS02R1    654    768.5    98.2    
dataS01R2    445    654    764    
dataS01R1    324    234    542.12
[download]

This shows the headers (the keys) followed by the list elements.

The regex /^dataS\d\dR\d$/ matches only the column headings listed above, which have the pattern dataSnnRn, where "n" signifies a digit 0-9.

Thus, the notation @{ $hash{$key} } represents the array of elements under the heading contained in $key.

If you want to process a specific column, use eq instead of a regex, e.g., if ( $key eq 'dataS01R1' ) { ....

This may now lead back to the original, unmodified script which generates a set of files for each processed table, as you can modify the script to fit your needs.

I try running it but there's error which the script keep running without stopping.

The script executes locally as expected, given the datasets you've shared. Without knowing more about your data, it's difficult to troubleshoot the problem you're experiencing.

Hope this helps!

[reply]
[d/l]
[select]

Ok Thanks :) Will figure it out somehow.

[reply]


We don't bite newbies here... much
	PerlMonks