Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

how apply large memory with perl?

by xiaoyafeng (Deacon)
on Aug 08, 2012 at 10:01 UTC ( [id://986199]=perlquestion: print w/replies, xml ) Need Help??

xiaoyafeng has asked for the wisdom of the Perl Monks concerning the following question:

Asking this question roots from When Perl is not applicable which makes a war on blog perl site. ;) On the contrary to the author's opnion, I does belive perl can handle large data, but I'm more interested in how perl handle it. and did some tests:

env: Win XP, perl 5.12.3, 512M memory available: 350M C:\>perl -e "@a = (1..1_000_000_000_000);" Range iterator outside integer range at -e line 1. # of course! C:\>perl -e "@a = (1..1_000_000_000);" Out of memory! # fair enough ;) C:\>perl -e "@a = (1..9_000_000);" Terminating on signal SIGINT(2) #took too long time, and I have to ki +ll it :(

When I ran the third one, perl ocuupied almost 340M memory at first. after 3 or 4 secs, it seems realize there is no enough memory for applying, then the utilization decreses to about 60M and hang. :(

So How perl handle applying of large memory? seems perl would do some swap when memory is not enough, but it isn't productive. It there a better way to make perl smarter? (i'd rather perl tells me "out of memory" than long time waiting).

Any cents? </code>




I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction

Replies are listed 'Best First'.
Re: how apply large memory with perl?
by BrowserUk (Patriarch) on Aug 08, 2012 at 10:52 UTC

    When loading large volumes of data, a little care can go a very long way.

    • If I load an array with 9e6 values your way it requires 905MB(*) of ram to complete the process:
      [11:30:52.92] C:\test>perl -E" @a=1..9e6; say for grep /$$/, `tasklist`" perl.exe 5632 Console 1 937 +,368 K [11:30:56.08] C:\test>

      (*YMMV. I'm using 64-bit Perl. It will be less on 32-bit.)

    • But, with just a little effort, I can populate that same data using under 300MB:
      [11:31:01.54] C:\test>perl -E" $#a=9e6; $a[$_-1]=$_ for 1..9e6; say for grep /$$/, `tasklist`" perl.exe 3396 Console 1 292 +,188 K [11:31:04.24] C:\test>

      And if you look closely at the timestamps, it even runs around 20% faster.

    • And if I'm really pressed for space, I can cut that down to less than 40MB with very little extra effort or time cost:
      [11:47:33.84] C:\test>perl -E" $a=chr(0); $a x=9e6*4;substr($a,4*($_-1),4,pack'N*',$_) for 1..9e6; say for grep /$$/, `tasklist`" perl.exe 8028 Console 1 40 +,308 K [11:47:37.79] C:\test>

      Accesses will be a tad slower, but no so much as to take me anywhere near 24 hours to sum and count a few million data points.

    All commands wrapped for clarity!

    And the same or similar techniques can be used for most every aggregate population task. It is just a case of knowing when to use them.

    Most of the time we don't bother because our data sizes are such that it isn't worth the (small) effort; but it behooves us to know when the small extra effort will pay big dividends.

    As for the blogger; quite why he feels the need to load all his DB-held data into his program in order to do bread and butter SQL queries is beyond me.

    Whilst I don't entirely disagree with the premise that there are times when Perl isn't the right choice; making the fundamental error of pulling all his DB-held data into perl in order to perform processing that he actually describes as "all you need is sum(field), count(field) where date between date1 and date2,", just makes me doubt the veracity of his conclusions.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      ++!!

      BTW. I don't know internal details of .. op and $#, but what makes so big difference?





      I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction

        I don't know internal details of .. op and $#, but what makes so big difference?

        When you use the range operator outside of a for loop, Perl creates that list on one of its internal stacks first. This requires ~650MB, and is best illustrated by creating that mythically non-existant "list in a scalar context":

        C:\test>perl -E"scalar( ()=1..9e6 ); say grep /$$/,`tasklist`" perl.exe 2548 Console 1 650 +,108 K

        Now we've got the list, it need to be copied to the array, which takes the other ~300MB:

        C:\test>perl -E"@a=1..9e6; say grep /$$/,`tasklist`" perl.exe 2596 Console 1 937 +,388 K

        When you use the range operator in the context of a for statement; it acts as an iterator, thus completely avoiding the creation of the stack-based list:

        C:\test>perl -E"1 for 1..9e6; say grep /$$/,`tasklist`" perl.exe 4112 Console 1 4 +,776 K

        If we just assigned the values to the array one at a time, the array would have to keep doubling in size each time it filled; in order to accommodate new values, resulting in the memory from previous resizings freed to the heap, but still needed at one instance in time and an overall memory usage of 400MB:

        C:\test>perl -E"$a[$_-1]=$_ for 1..9e6; say grep /$$/,`tasklist`" perl.exe 2800 Console 1 402 +,312 K

        By pre-sizing the array to its final size we save those intermediate resizings and another 100+MB:

        C:\test>perl -wE"$#a=9e6; $a[$_-1]=$_ for 1..9e6; say grep /$$/,`taskl +ist`" perl.exe 4880 Console 1 292 +,196 K

        Simple steps with big gains.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      Ok, I'm stumped. Why in the world does the second line below result in so much less memory usage than the first? Just because it allocates the space before starting to fill it?

      @a=1..9e6; $#a=9e6; $a[$_-1]=$_ for 1..9e6;

      Aaron B.
      Available for small or large Perl jobs; see my home node.

        Ok, I'm stumped. Why in the world does the second line below result in so much less memory usage than the first? Just because it allocates the space before starting to fill it?

        Yup. Its discussed in keys and perldata...

        Used as an lvalue, "keys" allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big. (This is similar to pre-extending an array by assigning a larger number to $#array.) If you say keys %hash = 200; then %hash will have at least 200 buckets allocated for it--256 of them, in fact, since it rounds up to the next power of two. These buckets will be retained even if you do "%hash = ()", use "undef %hash" if you want to free the storage while %hash is still in scope. You can't shrink the number of buckets allocated for the hash using "keys" in this way (but you needn't worry about doing this by accident, as trying has no effect). "keys @array" in an lvalue context is a syntax error.
        pre-extend, buckets, http://search.cpan.org/dist/illguts/index.html, http://search.cpan.org/perldoc/Devel::Size#UNDERSTANDING_MEMORY_ALLOCATION

        See Re^3: how apply large memory with perl? for the step-by-step guide :)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: how apply large memory with perl?
by Corion (Patriarch) on Aug 08, 2012 at 10:16 UTC

    Perl lets the operating system handle large memory situations. You will need to set limits in your operating systems (or remove them) if you want to change how programs get allocated resources like memory. For unixish OSes, there is ulimit.

Re: how apply large memory with perl?
by Anonymous Monk on Aug 08, 2012 at 10:34 UTC
Re: how apply large memory with perl?
by sundialsvc4 (Abbot) on Aug 08, 2012 at 13:31 UTC

    Yes, there sometimes are situations where you legitimately must have in-memory millions of data-points such that you instantaneously must have access to them all.   In those cases, you must have more than sufficient RAM with uncontested access to it.   Otherwise you are going to inevitably hit the “thrash point,” and when that happens, the performance degradation is not linear:   it is exponential.   The curve looks like an up-turned elbow, and you “hit the wall.”   That is certainly what is happening to the OP.

    BrowserUK’s algorithm is of course more efficient, and he has the RAM.   In the absence of that resource, no algorithm would do.   (And in this case, the prequisite of sufficient RAM is implicitly understood.)   You can still see just how much time it takes, just to allocate that amount of data, even in the complete absence of paging contention.   And the real work has not yet begun!

    Frequently, large arrays are “sparse,” with large swaths of missing values or known default values.   In those cases, a hash or other data structure might be preferable.   Solving the problem in sections across multiple runs might be possible.   You must benchmark your proposed approaches as early as possible, because with “big data,” wrong is “big” wrong.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://986199]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-25 16:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found