Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: Re: Re: Randomize lines with limited memory

by Corion (Pope)
on Nov 03, 2003 at 08:01 UTC ( #304043=note: print w/replies, xml ) Need Help??

in reply to Re: Re: Randomize lines with limited memory
in thread Randomize lines with limited memory

I did mention that I trade space (memory) for time, didn't I?

Also, my method dosen't involve random seeks, it only involves rewinding the file to the start point, which is a seek to a fixed position in the file. I think you could halve the number of times the file by checking whether the next line number is greater than the current line number and then winding forward to that one, as I think that there is a 50% chance that the next line number to be written is higher than the current number.

That point is moot though, as Tie::File implements a caching scheme and is a much cleaner solution than my naive but memory conserving approach.

perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

Replies are listed 'Best First'.
Re: Re: Re: Re: Randomize lines with limited memory
by BrowserUk (Pope) on Nov 03, 2003 at 08:18 UTC

    Unfortunately, Tie::File is not a solution to this problem.

    For a file of 100,000 lines, it completes, but takes around 800 times longer than List::Util::shuffle on an in memory array. So what, trade time for space. The problem is that the increase in time is exponential, meaning that shuffling the OP's 27,000,000 record file would take close to a week!

    However, more fundemental than that is that Tie::File simply isn't able to cope with a file this large. It runs out of memory and segfaults trying this with a 1,000,000 record file on my machine, where as I can comfortably load a 2,000,000 record file completelly into memory and sort even using the copy semantics of List:Util::shuffle. This is true regardles of what settings (default, bigger or smaller) I tried using for the memory parameter.

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://304043]
and monks are getting baked in the sun...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2018-05-26 22:12 GMT
Find Nodes?
    Voting Booth?