go ahead... be a heretic | |
PerlMonks |
Re: Filter and writing error log fileby Laurent_R (Canon) |
on Jul 22, 2014 at 21:44 UTC ( [id://1094682]=note: print w/replies, xml ) | Need Help?? |
Could it be done while reading the file in while loop ( tas I have tried below in the code)?
Yes, by all means, not only it can be done this way, but this is most often the best way you can do it (i.e. when the current data under examination doesn't need to know about the previous or next data chunk to be validated, examined or used). Especially with DNA files which, as far as I know(I am not a bio guy), can be very large. Using a while loop on your file (i.e. a file iterator) makes it possible for you to read absolutely huge files without ever running into out-of-memory problems (it might take time, but at least you have a very high probability of running your program to the end). I am working almost daily with huge files (typically between 3 and 15 GB, sometimes as much as 200 GB). In such cases, slurping the file into memory is just not an option, my program would die. Reading it with an iterator (a 'while (my $line = <$IN>) {' type of construct) is the only solution, and it does not use more memory than the size needed for the longest line. I had a relatively similar problem over the last days and found the solution this morning. A proprietary database (with no Perl module/driver), but implementing a protocol similar to SQL. I needed to load a rather large quantity of data into memory, and then process the main table using the data in memory. My first attempt last week ran out of memory and the program crashed. Before trying to load the main table, my original program was already using 155,000 blocks of memory (my best guess is that a memory block is 8 kB, but not sure). Anyway, after having loaded 155,000 blocks, trying to load the main table failed for lack of memory. After some experimentation, I was able to reduce memory consumption (changing hashes of hashes to hashes of strings), but the main improvement was to be able to use an iterator on the main table with a syntax as follows: Having done these changes, my program is never using more than 62,000 blocks, so that it can be considered as fairly safe. Other comments on your code: your identifiers are very poor, X, Y and A don't say anything about the content. Similarly, $a, $t, $g and $c may seem OK when talking about DNA, I would suggest you use at least two letters for better identification. Also, the $a (and also $b) variable has a special meaning in Perl (used especially for sorting) and should probably be avoided for other purposes. Not sure I answered your question, but not sure about what your question really was. I hope that I gave at least some indications.
In Section
Seekers of Perl Wisdom
|
|