Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
  1. The first problem

    You are using File::Slurp wrongly. (For a file of this size!)

    When you call my $s = read_file( $filename );, it first reads the entire 500MB into an internal scalar, and then it returns it to you.

    Where you then assign it to a scalar in your context.

    You now have 2 copies of the data in memory: 1GB! And you haven't done anything with it yet.

    You then run your regex on it, which takes around half a second on my machine and causes no memory growth.

    Then you pass your copy of the data into write_file(), which means it gets copied onto the stack.

    You now have 3 copies of the data in memory: 1.5GB!

    And internally to write_file(), it gets copied again. You now have 4 copies of the data in memory: 2GB!

    And if you are on a 32-bit Perl, you've blown your heap and get the eponymous "Out of memory!".

    And if you are on a 64-bit perl with enough memory, it then spends an inordinate amount of time(*) futzing with the copied data "fixing up " that which isn't broken. Dog knows why it does this. It doesn't need to. Just typical O'Woe over-engineering!.

    25 minutes+ 2 hours=!(**) (before I ^C'd it), to write 500MB of data to disk is ridiculous!

    (**For a job that can be completed in 8 seconds simply, without trickery, 2 hours is as close to 'Never completes' as makes no difference.)

    How to use File::Slurp correctly. (For a file of this size!).

    File::Slurp goes to (extraordinary) lengths in an attempt to "be efficient". (It fails miserably, but I'll get back to that!).

    When reading the file, you can avoid the copying of the data, by requesting that the module return a reference to the data, thus avoiding the copying done by the return.

    And when writing the file, you can pass that reference back. The module will (for no good reason) still copy the data internally before writing it out, but you do save another copy:

    This way, you only have one redundant copy of the data in memory for a saving of 1GB Your process won't run out of memory.

    However, it will still take 25 minutes+ 2 hours=! (I didn't wait any longer) to actually write 500MB to disk!

  2. Your second mistake was using File::Slurp!

    How about we try the same thing without the assistance of any overhyped, over-engineered, overblown modules.

    #! perl -slw use strict; use Time::HiRes qw[ time ]; print STDERR time; my $s; do{ local( @ARGV, $/ ) = $ARGV[0]; $s = <>; }; print STDERR time; $s =~ tr[\t][ ]; print STDERR time; open O, '>', $ARGV[1] or die $!; { local $\; print( O $s ); } close O; print STDERR time; __END__ [ 0:57:20.47] C:\test>984648-3 500MB.csv junk.txt 1343779056.03211 1343779058.22142 1343779058.70098 1343779061.99852 [ 0:57:42.05] C:\test>

    2 seconds to read it; 1/2 second to process it; 4 seconds to write it; and only 510MB memory used in the process!

    That's efficient!

Bottom line: When you consider using a module for something -- LOOK INSIDE!. If it looks too complicated for what it does; it probably is.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?


In reply to Re: Windows 7 Remove Tabs Out of Memory by BrowserUk
in thread Windows 7 Remove Tabs Out of Memory by tallums

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others avoiding work at the Monastery: (14)
    As of 2014-08-22 14:08 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (158 votes), past polls