Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

How to reorder a text file

by hennesse (Sexton)
on Jun 11, 2018 at 21:53 UTC ( #1216436=perlquestion: print w/replies, xml ) Need Help??
hennesse has asked for the wisdom of the Perl Monks concerning the following question:

I have a 15,000 line text file where I want to "reverse" groups of lines. In the example below, a group is a Title followed by 0 or more lines of text. What is an efficient approach to do this?

Input file:
Title 1 Line of text A Line of text B Title 2 Line of text C Title 3 Title 4 Line of text D
Desired Output:
Title 4 Line of text D Title 3 Line of text D Title 3 Title 2 Line of text C Title 1 Line of text A Line of text B
Thanks, Dave

Replies are listed 'Best First'.
Re: How to reorder a text file
by tybalt89 (Priest) on Jun 11, 2018 at 22:06 UTC
    #!/usr/bin/perl # https://perlmonks.org/?node_id=1216436 use strict; use warnings; print reverse split /^(?=Title)/m, do{ local $/; <DATA>}; __DATA__ Title 1 Line of text A Line of text B Title 2 Line of text C Title 3 Title 4 Line of text D

    Outputs:

    Title 4 Line of text D Title 3 Title 2 Line of text C Title 1 Line of text A Line of text B

    Which I think is more valid than your Desired Output

      Works perfectly - Thank you!
Re: How to reorder a text file
by Cristoforo (Curate) on Jun 11, 2018 at 23:43 UTC
    Taken from Re: Split and print hash based on regex
    #!/usr/bin/perl use strict; use warnings; open my $fh, '<', \<<EOF; Title 1 Line of text A Line of text B Title 2 Line of text C Title 3 Title 4 Line of text D EOF { local $/; print reverse <$fh> =~ /(^Title(?:(?!^Title).)*)/msg; }
Re: How to reorder a text file
by kcott (Chancellor) on Jun 12, 2018 at 07:40 UTC

    G'day Dave,

    Take a look at the recent "Sort text by Chapter names" thread. A number of the solutions presented there are easily adapted to your requirements.

    — Ken

Re: How to reorder a text file
by tybalt89 (Priest) on Jun 12, 2018 at 19:08 UTC

    Just for fun, here's a solution that doesn't keep blocks of text in memory, and therefor may work for larger files.

    Besides, how often does one get to use unshift, and set $/ to a reference :)

    Notice also that you can seek around in DATA.

    #!/usr/bin/perl # https://perlmonks.org/?node_id=1216436 use strict; use warnings; my @sections; while( <DATA> ) { if( /^Title/ ) { unshift @sections, [ tell(DATA) - length, length ]; } elsif( @sections ) { $sections[0][1] += length; } } for ( @sections ) { seek DATA, $_->[0], 0 or die; local $/ = \$_->[1]; print scalar <DATA>; } __DATA__ Title 1 Line of text A Line of text B Title 2 Line of text C Title 3 Title 4 Line of text D
Re: How to reorder a text file
by johngg (Abbot) on Jun 12, 2018 at 17:34 UTC

    An approach reading a line at a time rather than slurping the whole file.

    use strict; use warnings; open my $inFH, q{<}, \ <<__EOD__ or die $!; Title 1 Line of text A Line of text B Title 2 Line of text C Title 3 Title 4 Line of text D __EOD__ my @groups; while ( <$inFH> ) { if ( m{^Title} ) { unshift @groups, [ $_ ]; } else { push @{ $groups[ 0 ] }, $_; } } while ( my $group = shift @groups ) { print @{ $group }; }

    The output.

    Title 4 Line of text D Title 3 Title 2 Line of text C Title 1 Line of text A Line of text B

    I hope this is of interest.

    Cheers,

    JohnGG

      ... rather than slurping the whole file.

      But won't the  while ( <$inFH> ) { ... } loop end up reading the entire file into the  @groups array (actually an array-of-arrays or AoA) before it finishes? This is not what I think of when I think of "line-by-line" processing of a file. (Actually, it looks a lot like slurping! :)


      Give a man a fish:  <%-{-{-{-<

Re: How to reorder a text file
by tybalt89 (Priest) on Jun 13, 2018 at 03:26 UTC

    I wanted to do two things - read a section at a time, and write the output file from back to front. This allows a single pass through the input file.
    Since the sections don't have an obvious ending, reading into the next section is required, but that's easy to fix when it happens :)

    #!/usr/bin/perl # https://perlmonks.org/?node_id=1216436 use strict; use warnings; my $inputfile = 'd.1216436.in'; my $outputfile = 'd.1216436.out'; my $pointer = -s $inputfile; # output will be same size as i +nput open my $in, '<', $inputfile or die "$! opening $inputfile"; open my $out, '>', $outputfile or die "$! opening $outputfile"; local $/ = "\nTitle"; # too far, but we will fix it while( <$in> ) # read whole section { s/\n\KTitle\z// and seek $in, -5, 1; # if read too far, fix & backup seek $out, $pointer -= length, 0; # backup output pointer print $out $_; # put in proper place } close $out;

    Now if I can only stop giggling :)

Re: How to reorder a text file
by monsenhor (Novice) on Jun 13, 2018 at 01:48 UTC
    #!/usr/bin/env perl use strict; use warnings; my $txt = " Title 1 Line of text A Line of text B Title 2 Line of text C Title 3 Title 4 Line of text D "; my @reverse; my @normal = split /(Title \d)/, $txt; foreach my $n (0 .. @normal){ if ($n%2){ push @reverse, $normal[$n].$normal[$n+1]; } } chomp @reverse; @reverse = reverse @reverse; map { print $_."\n" } @reverse;

      In the

      foreach my $n (0 .. @normal){ if ($n%2){ push @reverse, $normal[$n].$normal[$n+1]; } }
      loop, aren't you indexing beyond the end of the  @normal array? Won't this produce "Use of uninitialized value..." warnings if warnings are enabled? Shouldn't the loop range be  for my $n (0 .. $#normal-1) { ... } instead?

      (And BTW: Aren't you losing an opportunity to avoid using a foreach-loop by not re-writing the loop as a map | void-context map statement? :)


      Give a man a fish:  <%-{-{-{-<

Re: How to reorder a text file
by rminner (Chaplain) on Jun 13, 2018 at 23:57 UTC

    There are already plenty good answers. Adding another answer for the fun of it. This answer uses the CPAN Module File::ReadBackwards.

    It reads the file backwards and prints out the results every time a block is complete (Title line found). Memory usage should therefore be low. Exit if the expected "Title" line is not found for $MAX_LENGTH chars. Otherwise large input files without "Title" lines could still cause massive memory usage.

    Caveat: the length is only updated every time a "line" has been read. If there is a huge input line (e.g. several gigabytes without newline), this programm will still crash and burn :).

    use strict; use warnings; use File::ReadBackwards; my $MAX_LENGTH = 1_000_000; my $infile = shift @ARGV; die "'$infile' is not a file!" unless (-f $infile); my $bw = File::ReadBackwards->new( $infile ) or die "can't read '$infile' $!"; my (@block, $length, $line); while( defined( $line = $bw->readline ) ) { unshift @block, $line; $length += length $line; if ($line =~ m/^Title\b/) { print @block; @block = (); $length = 0; } if ($length >= $MAX_LENGTH) { die "$length chars read, without finding a 'Title' line\n". "Input file '$infile' probably has the wrong file format." +; } } if (@block) { print STDERR scalar(@block) ." lines not printed out yet!"; }
Re: How to reorder a text file
by Anonymous Monk on Jun 11, 2018 at 21:59 UTC

      Dang - Fried Brain Syndrome. How's this?

      Input file:
      Title 1 Line of text A Line of text B Title 2 Line of text C Title 3 Title 4 Line of text D
      Desired Output:
      Title 4 Line of text D Title 3 Title 2 Line of text C Title 1 Line of text A Line of text B
        $ tac file | perl -ne'push@a,$_;if(/^Title\b/){print reverse@a;@a=()}'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1216436]
Approved by Paladin
Front-paged by haukex
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2018-07-21 04:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (444 votes). Check out past polls.

    Notices?