Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Remove Blank Lines Off the End of a File

by jbisbee (Pilgrim)
on Feb 27, 2002 at 20:29 UTC ( #148025=perlquestion: print w/ replies, xml ) Need Help??
jbisbee has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I'm looking for the most effcient way of removing blank lines off the end of a text file without reading the whole thing. (the files I'm working with can be up to 3 megs). I need the solution to be fast.

-biz-

Comment on Remove Blank Lines Off the End of a File
Re: Remove Blank Lines Off the End of a File
by dragonchild (Archbishop) on Feb 27, 2002 at 20:54 UTC
    Don't use Perl if you don't want to read the whole file in. Use some manner of shell scripting or C. (In C, the function you're looking for is seek(), similar to the Perl seek.)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      Don't use Perl if you don't want to read the whole file in
      Sure you can do this in perl without reading the whole file. Here's some code (untested) (updated to fix a couple of mistakes).
      #!/usr/bin/perl -w use strict; my $file = shift or die; open my $fh, "+<$file" or die "$!"; my $size = 4096; my ($cur_pos, $buf); seek $fh, -$size, 2; while (1) { $cur_pos = tell $fh; read $fh, $buf, $size; last if $buf =~ m/\S/s; seek $fh, -$size, 1; } $buf =~ m/(\s+)$/s; $cur_pos += $-[0]; truncate $fh, $cur_pos; close $fh; exit 0;
      This will read only what is necessary and does not keep what it has already processed.

      I am sure there are better ways of doing this.

      /prakash

      Update: I finally got some time and tested the above and found another bug (fixed in the code above). I was not supposed to use sysread and seek<code> together (<code>sysseek will probably do fine), so I changed the sysread to <code>read<code>, and it worked ok.

      (Note to self: Never post untested code.)

        This works! Thanks Prakash!

        -biz-


        Very nice.

        However, it strips the last \n from the file (which probably isn't desirable) and it truncates non-whitespace data if there isn't a final \n. Which means that if you run the program twice in a row it will strip non-whitespace data from the end of the file.

        Also, your tell() is followed by a read() so the next seek() starts from the EOF again and not from $cur_pos. ;-)

        The following changes fix these problems:

        #!/usr/bin/perl -w use strict; my $file = shift or die; open my $fh, "+<$file" or die "$!"; binmode $fh; # Just in case my $size = 4096; my ($cur_pos, $buf); seek $fh, -$size, 2; while (1) { $cur_pos = tell $fh; read $fh, $buf, $size; last if $buf =~ m/\S/s; seek $fh, -$size*2, 1; } $buf =~ m/(\s+)$/s; $cur_pos += $-[0] || 0; truncate $fh, ++$cur_pos if $cur_pos; close $fh; exit 0;

        --
        John.

Re: Remove Blank Lines Off the End of a File
by jmcnamara (Monsignor) on Feb 28, 2002 at 14:12 UTC

    Here is a one-liner. On an arbitrary test system it took 4 seconds to process a 3 meg file.

    The changes are made in-place so change -i to -i.bak if you want to keep a backup. perl -i -ne '/^$/ ? $i++ : do{print "\n" x $i, $_; $i=0}' file

    Update: I just ran PrakashK's program on the same test data. It took 0.1 sec!!

    --
    John.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://148025]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2014-09-20 13:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (159 votes), past polls