Re: Out of Memory

Replies are listed 'Best First'.
Re^2: Out of Memory by Anonymous Monk on Mar 28, 2013 at 17:25 UTC
Hmm you're right those don't do the same things. Incidentially this also caused the out of memory error `($nulls) = $_ = /\0/g;` however, I found another method that works and doesn't seem as likely to cause the extra memory overhead. `while ($_ =~/\0/g) {$nulls++}` [download]	[reply] [d/l] [select]
Re^3: Out of Memory by BrowserUk (Patriarch) on Mar 28, 2013 at 17:38 UTC
The simplest, fastest and most efficient way to count the nulls (or any character) in a string is: `my $nulls = $string =~ tr[\0][\0];` [download] Update: corrected '0' to '\0'. Thanks to davido. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^4: Out of Memory by Michael Kittrell (Novice) on Mar 28, 2013 at 19:05 UTC
Thanks... I didn see a reference to it in the thread that I got the while loop from but it noted that TR can only be used with single characters? which works in this case but maybe not others? I did try the TR thing and it does work for my data set without a memory error. Are you sure this is the most efficient way to do this? Seems like to me that its creating a copy of the original string and trying to replace the matches before it outputs the count. (as far as i can tell given reading a quick page on teh TR function) I wouldn't think that would be as memory efficient as the while code... but I don't understand the internals of the while code either, if it instantiates a huge list the interates through them, I can see how that wouldn't be as efficient as teh TR code.	[reply]
Re^5: Out of Memory by BrowserUk (Patriarch) on Mar 28, 2013 at 19:12 UTC
Re^6: Out of Memory by Michael Kittrell (Novice) on Mar 28, 2013 at 19:31 UTC
Some notes below your chosen depth have not been shown here
Re^3: Out of Memory by davido (Cardinal) on Mar 28, 2013 at 17:29 UTC
Whatever method you use, you're teetering on the edge. I would probably prefer taking in smaller chunks and processing them individually rather than trying to hold the entire thing in memory at the same time. Even if `while( $_ =~ /\0/g ) { $null++ }` keeps you below the mark, if your file grows by some small amount, you'll be back to bumping into the memory limit again. In other words, none of your methods really address the elephant in the corner, which is that holding the entire data set in memory at once is consuming all your wiggle-room. Dave	[reply] [d/l]
Re^4: Out of Memory by BrowserUk (Patriarch) on Mar 28, 2013 at 17:40 UTC
Holding a 5MB string in memory is hardly onerous. The problem is entirely down to creating a huge list of scalars each containing a single character in order to count those characters. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^5: Out of Memory by davido (Cardinal) on Mar 28, 2013 at 17:46 UTC
Re^4: Out of Memory by Michael Kittrell (Novice) on Mar 28, 2013 at 19:19 UTC
I should switch to reading it in as a stream for the reason you stated (although I never expected 70 million nulls on a line), but I haven't done that in perl before while I have used the while(<file>) syntax many times to read one line at a time. The idea was to a short and dirty which worked fine until last week. Still, my real question and reason for posting was a quest for the knowledge of what was happening internally that caused the 2nd statement to use more memory than the first... and a lot more memory than I expected. Per the second response, running a 5 million byte string through the 2nd statement consumed 320 MB of memory. That seems like a lot to me. 5 million bytes is what 5 mb? I think the answer (as mentioned somewhere in this thread) is that its creating 5 million scalers with 1 char each. If there was 20 bytes of overhead per scalar, I could see how 5 mb becomes 320 mb (when you chain several statements together in a single line. Of course this assumes scalars have lots of overhead (again something i dont know about). BTW thank you everyone who has responded so far. I appreciate the knowledge share.	[reply]


more useful options
	PerlMonks