Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^2: How to deal with Huge data

by chromatic (Archbishop)
on Jan 23, 2007 at 23:06 UTC ( #596170=note: print w/ replies, xml ) Need Help??


in reply to Re: How to deal with Huge data
in thread How to deal with Huge data

Do not initialize an variable with my everytime the program enter in a loop. Initialize the variable before the loop and them, inside the loop, once the variable will not be used anymore, just clean up the value it holds. This is faster.

The OP is doing I/O. How could this possibly matter, if it's even true?


Comment on Re^2: How to deal with Huge data
Re^3: How to deal with Huge data
by glasswalk3r (Pilgrim) on Jan 24, 2007 at 12:28 UTC

    I didn't understand very well what means "OP", but anyway... the tip is a bit off-topic since it's not related to the memory issue. But is a tip anyway.

    Doing things like the code below:

    my @t; T: while( my $line = <GSE> ) { $line =~ s/[\r\n]//g; @t = split(/\t/, $ligne); if( $. == 1 ) { shift(@t); @samples = @t; next T; } @t = ();

    Should avoid memory allocation everytime the variable is created/removed. If this really does not work like that, please let me know.

    Alceu Rodrigues de Freitas Junior
    ---------------------------------
    "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill

      That's just noise. Even if Perl doesn't keep around an AV internally to avoid the cost of reallocating a variable (and I believe there's an optimization which does exactly that), look at all of the other, more expensive, work in that snippet:

      • Reading a line from a file. Here's the biggest time sink: doing system calls, seek times, transferring data across multiple busses, checking for cache hits and paying for cache misses, running through any IO layers....
      • Doing an unanchored regular expression with a character class; that means examining every character in the string and allocating and building an entirely new string--and just try to guess beforehand how long that new string needs to be.
      • Creating new SVs for every tab-separated element in the line.

      You have to do a tremendous amount of optimization before hoisting your variable declaration out of the loop makes any measure difference, and that's if Perl doesn't do that optimization already. Besides that, changing the memory layout of your program probably has a bigger effect on performance, if you take I/O out of the picture. What if you create an extra page fault per loop by needing an extra page? What if you fragment memory more this way? How do you even measure this in a meaningful way?

      Thus I say it's a silly pseudo-optimization.

        I don't know how to measure memory using or fragmentation by a single variable from a Perl program... I don't know even if it's possible. Again, the tip was about running the program faster, not to help to save memory.

        Of course, I agree it may be a premature optimization. As you said, this is not a silver bullet... it may work, it may not. It's up to the programmer to put this to a test, but for sure there are more important modifications to implement first.

        Alceu Rodrigues de Freitas Junior
        ---------------------------------
        "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://596170]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2014-12-21 01:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (100 votes), past polls