Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Out of Memory

by davido (Cardinal)
on Mar 27, 2013 at 18:12 UTC ( [id://1025778]=note: print w/replies, xml ) Need Help??


in reply to Out of Memory

Those don't even do the same thing. In the first example, if the length of $_ is 100, in the end $nulls will contain the integer 100, after jumping through the pointless hoop of a pattern match.


Dave

Replies are listed 'Best First'.
Re^2: Out of Memory
by Anonymous Monk on Mar 28, 2013 at 17:25 UTC
    Hmm you're right those don't do the same things.

    Incidentially this also caused the out of memory error

    ($nulls) = $_ = /\0/g;

    however, I found another method that works and doesn't seem as likely to cause the extra memory overhead.

    while ($_ =~/\0/g) {$nulls++}

      The simplest, fastest and most efficient way to count the nulls (or any character) in a string is:

      my $nulls = $string =~ tr[\0][\0];

      Update: corrected '0' to '\0'. Thanks to davido.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Thanks... I didn see a reference to it in the thread that I got the while loop from but it noted that TR can only be used with single characters? which works in this case but maybe not others? I did try the TR thing and it does work for my data set without a memory error.

        Are you sure this is the most efficient way to do this? Seems like to me that its creating a copy of the original string and trying to replace the matches before it outputs the count. (as far as i can tell given reading a quick page on teh TR function) I wouldn't think that would be as memory efficient as the while code... but I don't understand the internals of the while code either, if it instantiates a huge list the interates through them, I can see how that wouldn't be as efficient as teh TR code.

      Whatever method you use, you're teetering on the edge. I would probably prefer taking in smaller chunks and processing them individually rather than trying to hold the entire thing in memory at the same time. Even if while( $_ =~ /\0/g ) { $null++ } keeps you below the mark, if your file grows by some small amount, you'll be back to bumping into the memory limit again.

      In other words, none of your methods really address the elephant in the corner, which is that holding the entire data set in memory at once is consuming all your wiggle-room.


      Dave

        Holding a 5MB string in memory is hardly onerous.

        The problem is entirely down to creating a huge list of scalars each containing a single character in order to count those characters.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        I should switch to reading it in as a stream for the reason you stated (although I never expected 70 million nulls on a line), but I haven't done that in perl before while I have used the while(<file>) syntax many times to read one line at a time. The idea was to a short and dirty which worked fine until last week.

        Still, my real question and reason for posting was a quest for the knowledge of what was happening internally that caused the 2nd statement to use more memory than the first... and a lot more memory than I expected. Per the second response, running a 5 million byte string through the 2nd statement consumed 320 MB of memory. That seems like a lot to me. 5 million bytes is what 5 mb?

        I think the answer (as mentioned somewhere in this thread) is that its creating 5 million scalers with 1 char each. If there was 20 bytes of overhead per scalar, I could see how 5 mb becomes 320 mb (when you chain several statements together in a single line. Of course this assumes scalars have lots of overhead (again something i dont know about).

        BTW thank you everyone who has responded so far. I appreciate the knowledge share.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1025778]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-25 23:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found