Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Divide and conquer. Chop the big file into as many files as it takes to make them a manageable size. Then randomize each of them normally (i.e. using the fisher-yates). The tricky part is the initial chopping up. You can't simply take the first 10k lines, then the second 10k lines, etc. That wouldn't give you adequate randomization (obviously). I think I would read each line, and choose an output file at random, and append that line to it. The files won't come out exactly the same size (except perhaps rarely), but that doesn't matter. You could also similarly randomize on the final joining step as well, but I'm not sure that would actually buy you anything. You could try it.

jdporter
The 6th Rule of Perl Club is -- There is no Rule #6.


In reply to Re: Randomize lines with limited memory by jdporter
in thread Randomize lines with limited memory by natch

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others musing on the Monastery: (11)
    As of 2014-09-23 16:41 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (232 votes), past polls