Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^3: How to deal with Huge data

by glasswalk3r (Pilgrim)
on Jan 24, 2007 at 12:28 UTC ( #596247=note: print w/ replies, xml ) Need Help??


in reply to Re^2: How to deal with Huge data
in thread How to deal with Huge data

I didn't understand very well what means "OP", but anyway... the tip is a bit off-topic since it's not related to the memory issue. But is a tip anyway.

Doing things like the code below:

my @t; T: while( my $line = <GSE> ) { $line =~ s/[\r\n]//g; @t = split(/\t/, $ligne); if( $. == 1 ) { shift(@t); @samples = @t; next T; } @t = ();

Should avoid memory allocation everytime the variable is created/removed. If this really does not work like that, please let me know.

Alceu Rodrigues de Freitas Junior
---------------------------------
"You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill


Comment on Re^3: How to deal with Huge data
Download Code
Pseudo-Optimizations
by chromatic (Archbishop) on Jan 24, 2007 at 20:42 UTC

    That's just noise. Even if Perl doesn't keep around an AV internally to avoid the cost of reallocating a variable (and I believe there's an optimization which does exactly that), look at all of the other, more expensive, work in that snippet:

    • Reading a line from a file. Here's the biggest time sink: doing system calls, seek times, transferring data across multiple busses, checking for cache hits and paying for cache misses, running through any IO layers....
    • Doing an unanchored regular expression with a character class; that means examining every character in the string and allocating and building an entirely new string--and just try to guess beforehand how long that new string needs to be.
    • Creating new SVs for every tab-separated element in the line.

    You have to do a tremendous amount of optimization before hoisting your variable declaration out of the loop makes any measure difference, and that's if Perl doesn't do that optimization already. Besides that, changing the memory layout of your program probably has a bigger effect on performance, if you take I/O out of the picture. What if you create an extra page fault per loop by needing an extra page? What if you fragment memory more this way? How do you even measure this in a meaningful way?

    Thus I say it's a silly pseudo-optimization.

      I don't know how to measure memory using or fragmentation by a single variable from a Perl program... I don't know even if it's possible. Again, the tip was about running the program faster, not to help to save memory.

      Of course, I agree it may be a premature optimization. As you said, this is not a silver bullet... it may work, it may not. It's up to the programmer to put this to a test, but for sure there are more important modifications to implement first.

      Alceu Rodrigues de Freitas Junior
      ---------------------------------
      "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://596247]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2014-12-26 20:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls