Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re^3: Out of Memory

by davido (Archbishop)
on Mar 28, 2013 at 17:29 UTC ( #1026004=note: print w/replies, xml ) Need Help??

in reply to Re^2: Out of Memory
in thread Out of Memory

Whatever method you use, you're teetering on the edge. I would probably prefer taking in smaller chunks and processing them individually rather than trying to hold the entire thing in memory at the same time. Even if while( $_ =~ /\0/g ) { $null++ } keeps you below the mark, if your file grows by some small amount, you'll be back to bumping into the memory limit again.

In other words, none of your methods really address the elephant in the corner, which is that holding the entire data set in memory at once is consuming all your wiggle-room.


Replies are listed 'Best First'.
Re^4: Out of Memory
by BrowserUk (Pope) on Mar 28, 2013 at 17:40 UTC

    Holding a 5MB string in memory is hardly onerous.

    The problem is entirely down to creating a huge list of scalars each containing a single character in order to count those characters.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      You're right. :)

      This really should be a job for tr///, shouldn't it?

      my $string = '5mb of stuff'; my $count = $string =~ tr/\0/\0/;

      (I probably just missed seeing that proposed somewhere within the thread.)

      Update: Ah, I see it's the first thing you proposed (good job). This is probably a good time for me to walk away from the keyboard for awhile. ;)


Re^4: Out of Memory
by Michael Kittrell (Novice) on Mar 28, 2013 at 19:19 UTC
    I should switch to reading it in as a stream for the reason you stated (although I never expected 70 million nulls on a line), but I haven't done that in perl before while I have used the while(<file>) syntax many times to read one line at a time. The idea was to a short and dirty which worked fine until last week.

    Still, my real question and reason for posting was a quest for the knowledge of what was happening internally that caused the 2nd statement to use more memory than the first... and a lot more memory than I expected. Per the second response, running a 5 million byte string through the 2nd statement consumed 320 MB of memory. That seems like a lot to me. 5 million bytes is what 5 mb?

    I think the answer (as mentioned somewhere in this thread) is that its creating 5 million scalers with 1 char each. If there was 20 bytes of overhead per scalar, I could see how 5 mb becomes 320 mb (when you chain several statements together in a single line. Of course this assumes scalars have lots of overhead (again something i dont know about).

    BTW thank you everyone who has responded so far. I appreciate the knowledge share.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1026004]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2017-10-17 08:47 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (224 votes). Check out past polls.