http://www.perlmonks.org?node_id=1075252


in reply to Re: How to add column into array from delimited tab file
in thread How to add column into array from delimited tab file

Hello ken, thanks for explaining in your reply. It makes abit more sense for me now!
Yes to the following: •You have multiple, tab-delimited files •The first line of each file contains column headers •Each file may have a different number of columns
However, I do want to keep the first column. I have columns that contain dataR(X) (e.g. dataR1, dataR2...dataR28) and then followed by several links (contained in several columns..some rows will be empty.) which I also want to keep So right now, my problem here is trying to find the header that match dataS0XRx so that I can grab those columns to perform some calculations:
e.g. first file.txt: ID dataS01R1 dataS01R2 dataS02R1 dataS02R2 Links M45 345.2 536 876.12 873 http://.. M34 836 893 829 83.234 M72 873 123 342.36 837 M98 452 934 1237 938 http://.. =================================================== Calculation: row2/row2, row3/row2, row4/row2...row3400/row2 row2/row3, row3/row3, row4/row3 ... row3400/row3 row2/row4, row3/row4 ...row3400/row4 E.g dataS01R1 become: ID dataS01R1 ..dataS01R02... Links M45 1 (345.2/345.2) http://.. M34 2.42 (836/345.2) M72 2.52 (873/345.2) M98 1.309 (452/345.2) http://.. M45 0.41 (345.2/836) http://.. M34 1 (836/836) M72 1.04 (873/836) M98 0.54 (452/836) http://.. . . (loop through rows as denominator) .
and then loop through the column, print it out and filter off unwanted rows based on the average Coefficient Variance across all dataSXR0X rows (which I will figure out later after I manage to figure out the beginning part). So my problem here: How to find the column headers matching dataS0XR0X to put those columns into arrays for manipulation? here is my code which I have done initially before posting into perlmonk:
if($first) { #if this is the first file, find the column locations my $firstline = <CURINFILE>; #read in the header line chomp $firstline; my @columns = split(/\t/, $firstline); my $columncount = 0; while($columncount <= $#columns && !($columns[$columncount] =~ + /ID/)) { $columncount++; } $ID= $columncount; while($columncount <= $#columns && !(($columns[$columncoun +t] =~ /_dataS(\d+)R/) )) { $columncount++; } $intensitydata = $columncount; #read in the remainder of the file while(<CURINFILE>) { #add the id, intensity values to an array chomp $_; my @templine = split(/\t/,$_); my @tempratio = (); push(@tempratio, $templine[$ID]); push(@tempratio, $templine[$intensitydata]); print "\nWriting output...";
I tried this code initially (before changing to the code I posted in first post)but it doesn't print out anything so I do not know what's went wrong. I am working on large databases and initially I worked with excel but it is too slow and lag my whole computer when performing calculations, so I decided to try PERL instead as I read that it is good for manipulating large datasets. However I am quite new to PERL, just started two months back. So I am not sure if what I am doing is okay. If there are other suggestions, let me know too. I hope my explanation is not confusing. :)