snape has asked for the wisdom of the Perl Monks concerning the following question:
Hi all,
I am facing an interesting problem with multiple files and extraction from those files multiple times. To start with, I have a tab delimited file with 4 columns. For ex:
Col1 Col2 Col3 Col4 File1 abc 1000 1010 File2 xyz 2022 3000 File1 def 3211 2300 File4 ghi 4000 4100 File3 jkl 5002 5100 File4 mno 2001 2500 File5 pqr 100 150 File3 Ade 203 340 File2 Sea 101 201
The first column can have values as File1, File2 .. File40. The column two has unique names. The third and forth column are having numbers.
I have about a million records and I need to extract the string between the numbers that is mentioned between column 3 and column 4 (inclusive). The problem is I am trying to get a method where I am not opening more than one time i.e. I open the files 40 times (since there are 40 files) and extract the string. I am thinking of using a hash table but I am not able to come up with a good logic.
while(<$INPUT>){ chomp($_); my @arr = split('\t',$_); {"$arr[0]_"."$arr[1]"} = "$arr[0]\t$arr[1]\t$arr[2]\t$arr[3]"; ## ha +sh for keeping the files } close($INPUT); for(my $i = 1; $i <= 40; $i++){ open my $IN, "File".$i or die $!; ## Fasta File while(<$IN>){ ## Reading the files and extracting ## but I am not able to use the hash table ##properly } close($IN); } close($OUTPUT);
Since there are more than one string to retrieve from the file, I am not able to do that. Also, please keep in mind that these files are about 100 MB, so storing the files in the memory is also not a good technique. Any hints and help will be appreciated.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Multiple Extraction from Multiple Files
by toolic (Bishop) on Oct 07, 2010 at 01:06 UTC | |
by snape (Pilgrim) on Oct 07, 2010 at 21:07 UTC | |
Re: Multiple Extraction from Multiple Files
by JavaFan (Canon) on Oct 07, 2010 at 02:27 UTC | |
Re: Multiple Extraction from Multiple Files
by aquarium (Curate) on Oct 07, 2010 at 05:46 UTC | |
Re: Multiple Extraction from Multiple Files
by sundialsvc4 (Abbot) on Oct 07, 2010 at 14:03 UTC |