Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: How to add column into array from delimited tab file

by Kenosis (Priest)
on Feb 18, 2014 at 06:36 UTC ( [id://1075272]=note: print w/replies, xml ) Need Help??


in reply to Re^2: How to add column into array from delimited tab file
in thread How to add column into array from delimited tab file

Have made a few modifications:

use strict; use warnings; my ( @header, %hash ); my @files = qw/File1.txt File2.txt/; local $, = "\t"; for my $file (@files) { open my $fhIN, '<', $file or die $!; while ( my $line = <$fhIN> ) { my @columns = split ' ', $line; if ( $. == 1 ) { @header = @columns; } else { push @{ $hash{ $header[$_] } }, $columns[$_] for 0 .. $#co +lumns; } } close $fhIN; for my $key ( keys %hash ) { if ( $key =~ /^dataS\d\dR\d$/ ) { print $key, @{ $hash{$key} }, "\n"; } } undef %hash; }

All columns are kept. After the script has processed a file's lines, it iterates through the hash keys. Note that a regex attempts to match the heading pattern for the columns you're interested in processing. Now, when there a match, it just prints the key and the associated list of values.

Replies are listed 'Best First'.
Re^4: How to add column into array from delimited tab file
by hellohello1 (Sexton) on Feb 18, 2014 at 09:21 UTC
    I'm sorry, I do not understand this part...abit lost here:
    for my $key ( keys %hash ) { if ( $key =~ /^dataS\d\dR\d$/ ) { print $key, @{ $hash{$key} }, "\n"; }
    I want to be able manipulate the columns individually (thus attempting array) so I am not sure how hash can help in this case. I try running it but there's error which the script keep running without stopping. Instead, I attempted the matching of the colume names by using loop but I run into errors somewhere. I have my own code where I did a while loop to find the matching headers:
    my $first = 1; for(my $i = 0; $i < $originalfilecount; $i++) { #read in the current file open CURINFILE, "<$files[$i]" or die "Error couldn't open file $fi +les[$i]\n"; print "$files[$i]\n"; if($first) { #if this is first file, find column locations my $firstline = <CURINFILE>; #read headerline chomp $firstline; my @columns = split (/\t/, $firstline); my $columncount = 0; # print "$firstline\n"; #check if print headers correctly ####### Column Headers for ID, TIME ######### while ($columncount <= $#columns && !($columns[$columncount] = +~ /ID/)) { $columncount ++; } $ID = $columncount; while ($columncount <= $#columns && !($columns[$columncount] = +~ /Time/)) { $columncount ++; } $masstimes = $columncount; while ($columncount <= $#columns && !($columns[$columnco +unt] =~ /Links/)) { $columncount ++; } $Links = $columncount; #check if column position is correct (so far it is correct) print "ID is at column: $ID\n"; #output = 0 print "Time is at column: $masstimes\n"; #output = 1 print "Links is at column: $Links\n"; #output = 33 #DataR Columns (got ERROR here where I can't run script at all if I ad +d this) # while($columncount <= $#columns && !(($columns[$columncount] = +~ /_data/))) { $columncount++; } $columns[$columncount] =~ /_dataS(\d+)R/; my $currentReplicateID = $1; my $currentReplicateCount = 1; $ctrlStartCol = $columncount++; while($columncount <= $#columns) { $columns[$columncount] =~ /_dataS(\d+)R/; my $newReplicateID = $1; if($newReplicateID ne $currentReplicateID) { push(@replicateCount, $currentReplicateCount); $currentReplicateID = $newReplicateID; $currentReplicateCount = 1; } else { $currentReplicateCount++; } $columncount++; } #add the last replicate in push(@replicateCount, $currentReplicateCount); ###### End of Data Column Headers #### ####### Read remainder of the file ############## while (<CURINFILE>) { #add metabolite ID, MZ, RT to an array chomp $_; my @templine = split (/\t/, $_); push(@tempratio, $templine[$metabolite]); push(@tempratio, $templine[$masstimes]); push(@tempratio, $templine[$rt]); #ERROR #add intensities from the samples my $columnIndex = $ctrlStartCol; for(my $k = 0; $k <= $i; $k++) { $columnIndex += $replicateCount[$k]; } for(my $j = 0; $j < $replicateCount[$i+1]; $j++) { push(@tempratio, $templine[$columnIndex+$j]); } } } # end of main if loop close CURINFILE; } #end of main for loop ############## Start of output ################## print "\nWriting output..."; #create a new Directory and open output files and print out those valu +es from the hash that meet the filtering criteria #filtering criteria: defaults set at pvalue < 0.05, 0.5 <ratio > 1.5. +User specified mkdir "$pathname" or die "Error couldn't create new Directory"; open my $OUT1, ">$pathname/Metabolite ID.txt" or die "error couldn't o +pen output file"; open my $OUT2, ">$pathname/masstimes.txt" or die "error couldn't open +output file"; open my $OUT3, ">$pathname/retentiontimes.txt" or die "error couldn't +open output file"; open my $OUT4, ">$pathname/intensitydata.txt" or die "error couldn't o +pen output file"; print $OUT1 "$tempratio[0]"; print $OUT2 "$tempratio[1]"; print $OUT3 "$tempratio[2]"; print $OUT4 "$tempratio[3]"; close $OUT1; close $OUT2; close $OUT3;
    PS: The output only print 1st value for all OUT1 to OUT4 instead all the values in the whole columns...why?? my DataR names are something like this across the columns:
    8899_Neg_Rep01_dataS01R01 8889_Neg_Rep02_dataS01R02 8889_Neg_Rep03_dataS01R03 7499_Neg_Rep01_dataS02R01 7499_Neg_Rep02_dataS02R02 7499_Neg_Rep03_dataS02R03 7709_Neg_Rep01_dataS05R01 7709_Neg_Rep02_dataS05R02 7709_Neg_Rep03_dataS05R03 (and so on...)
    That's why I attempt a while loop to try match but I think somewhere is wrong? Not sure if it is the right way to do this.

      I want to be able manipulate the columns individually (thus attempting array) so I am not sure how hash can help in this case.

      In the case of using a hash of arrays (HoA), you can think of the arrays as named--although Perl has no such construct.

      For example, running the modified script that only prints the headers and values, the following, single file was used:

      ID dataS01R1 dataS01R2 dataS02R1 1 324 445 654 2 234 654 768.5 3 542.12 764 98.2

      And here the script's output:

      dataS02R1 654 768.5 98.2 dataS01R2 445 654 764 dataS01R1 324 234 542.12

      This shows the headers (the keys) followed by the list elements.

      The regex /^dataS\d\dR\d$/ matches only the column headings listed above, which have the pattern dataSnnRn, where "n" signifies a digit 0-9.

      Thus, the notation @{ $hash{$key} } represents the array of elements under the heading contained in $key.

      If you want to process a specific column, use eq instead of a regex, e.g., if ( $key eq 'dataS01R1' ) { ....

      This may now lead back to the original, unmodified script which generates a set of files for each processed table, as you can modify the script to fit your needs.

      I try running it but there's error which the script keep running without stopping.

      The script executes locally as expected, given the datasets you've shared. Without knowing more about your data, it's difficult to troubleshoot the problem you're experiencing.

      Hope this helps!

        Ok Thanks :) Will figure it out somehow.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1075272]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-23 21:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found