http://www.perlmonks.org?node_id=170990

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Is there anyway to mix up a text file line by line.

Ex.

1::something1
2::something2
3::something3

to..

3::something3
1::something1
2::something2

Replies are listed 'Best First'.
Re: Mixing Up a Text File
by grinder (Bishop) on Jun 02, 2002 at 10:40 UTC
    The solutions presented so far shuffle the lines as you go along. This is not good, although I can't recall where I read this. I thought it was on Perl Monks, but I can tease the thread out of the search box.

    The flaw (as I understand it) is that lines appearing earlier in the file get selected more often than lines appearing later in the file, which leads to some failure to ensure that all possible shuffles are equally likely.

    To do it right, you have to read all the lines in, and then apply a shuffling function to each entry. The canonical algorithm to do this is the Fisher-Yates shuffle.

    You would want to do something like:

    my @file = <>; my $i = @file; while( $i-- ) { my $j = int rand(1+$i); @file[$i, $j] = @file[$j, $i]; } print @file;

    As it turns out, this is one of the rarer class of algorithms that exist where you must slurp the entire file into memory in order to process it.

    update: I think MeowChow's and Screamer's solutions are ok, but they do have the effect of destroying the array. If all you need to do is print the lines then that is sufficient. Also, I'm not sure that creating repeated new arrays from the splice operation is wonderfully efficient. I haven't benchmarked it, it's just a gut feeling.

    If you need to process the array in other ways, however, before writing it out then you have to push the spliced-out array elements onto another array, which means even more gymnastics going on behind the scenes.


    print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'
      Shouldn't this also have the same effect?
      my @file = <>; print splice @file, rand @file, 1 while @file;
      ____________
      Makeshifts last the longest.
(MeowChow) Re: Mixing Up a Text File
by MeowChow (Vicar) on Jun 02, 2002 at 08:02 UTC
    Reiterating an old solution:
    perl -ne'splice@l,rand$.,0,$_}{print@l'
    or more readably:
    my @lines; splice @lines, rand $., 0, $_ while <>; print @lines;
       MeowChow                                   
                   s aamecha.s a..a\u$&owag.print
Re: Mixing Up a Text File
by I0 (Priest) on Jun 02, 2002 at 07:54 UTC
    while( <> ){ $j = int rand @array; @array[$.,$j] = ($array[$j],$_); } print @array;
Re: Mixing Up a Text File
by Zaxo (Archbishop) on Jun 02, 2002 at 07:55 UTC

    Sure, read into an array and use a slice:

    my @lines; { my $fh; open $fh, '<', '/path/to/file.txt' or die $!; @lines = <$fh>; } { my $fh; open $fh, '>' '/path/to/newfile.txt' or die $!; print $fh join $/, @lines[3,1,2]; print $fh $/; }
    There are lots of other tricks you can do, this is just a sample. It uses an array slice to select the output.

    After Compline,
    Zaxo

Re: Mixing Up a Text File
by samtregar (Abbot) on Jun 02, 2002 at 17:00 UTC
    Here's a nice clean one:

    use List::Util qw(shuffle); use Tie::File; tie @file, 'Tie::File', "file.txt" or die $!; @file = shuffle @file;

    -sam

Re: Mixing Up a Text File
by belg4mit (Prior) on Jun 02, 2002 at 16:54 UTC
    Fisher-Yates++. There are modules to keep one from having to cargo-cult/copy-n-paste it often.

    You'd think your early line more frequent argument would make sense, but if you look at perlfaq5 "How do I select a random line from a file?" the given algorythm is very similar. (Admittedly I can't really say either way, as when I've tried to adapt it to do this in the past I ended up with other anomalies.)

    --
    perl -pew "s/\b;([mnst])/'$1/g"

Re: Mixing Up a Text File
by grexman (Beadle) on Jun 02, 2002 at 18:29 UTC
    Dear Monks; Another idea on this topic came suddenly to my mind! I fear it is not a real solution, and when I tested it, it didn't work!
    But wouldn't it be nice, to just store it into a hash, and then print out the values of the hash, while hoping perl disorders the elements, like it does sometimes!
    (I know it does not help if it is absolutely necerserry to mix them up each time in a different way)
    I'm rather sure that this is the shortest solution.. Could someone more skilled please tell if this was possible or if it is just one of my silly ideas.
      #!/usr/bin/perl -w use strict; my(%hash,$i); @hash{map($i++.":$_", <>)} = (); print "$_\n" for map /^\d+:(.*)/, keys %hash;
      NB: the $i++ and regex trickery is necessary to preserve identical lines.

      Update: Since the above code will produce the same "randomized" file every time you run it with the same input data:
      #!/usr/bin/perl -w use strict; my(%hash,$i); @hash{map join(":", $i++, int rand $i, $_), <>} = (); print "$_\n" for map /^\d+:\d+:(.*)/, keys %hash;
      Which I guess was not the point, since we're back to using rand()
      ____________
      Makeshifts last the longest.