Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

How do I remove blank lines from text files?

by dumpest (Novice)
on Jul 10, 2000 at 21:35 UTC ( #21826=perlquestion: print w/ replies, xml ) Need Help??
dumpest has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks, I have a program that parses through a file line-by-line I was wondering how to use regular expressions to remove blank lines from this file... I'm not sure how to do this...

Comment on How do I remove blank lines from text files?
Re: How do I remove blank lines from text files?
by ahunter (Monk) on Jul 10, 2000 at 21:42 UTC
    (This will take input from STDIN and go to STDOUT). Depends on how you define blank lines. If you just want to remove lines that contain absolutely nothing, this will do the trick:
    while (<STDIN>) { print if (!/^$/); }
    ^ and $ are anchors and indicate the start and end of the current record (which will be the current line with the default record seperator). If you mean to delete lines that only contain whitespace, this will do the trick:
    while (<STDIN>) { print if (!/^\s*$/); }
    (\s is the whitespace character, and includes newlines, spaces and tabs)

    Andrew.

Re: How do I remove blank lines from text files?
by davorg (Chancellor) on Jul 10, 2000 at 21:43 UTC

    Depends what exactly you mean by a 'blank line'. Assuming that you mean a line that contains only white space characters then you can do something like this:

    while (<>) { print if /\S/; }

    And you can do the same thing from the command line

    perl -i.bak -n -e "print if /\S/" filename
    --
    <http://www.dave.org.uk>

    European Perl Conference - Sept 22/24 2000, ICA, London
    <http://www.yapc.org/Europe/>
Re: How do I remove blank lines from text files?
by c-era (Curate) on Jul 10, 2000 at 22:38 UTC
    regex may not be the best answer. A simple loop will work better.
    my @data; open (FILE,"+</path/file") || "Unable to open file"; flock (FILE,2) || die "Unable to lock file"; foreach(<FILE>){ push @data,$_ unless ($_ eq "\n"); } seek (FILE,0,0) || die "Unable to seek"; print FILE @data; truncate (FILE,tell(FILE)) || die "Unable to truncate file"; close FILE || die "Unable to close file";
      I don't like this for the following reasons:
      • You are reading the whole file in memory TWICE! Once for the arguments to the foreach and another one when pushing everything into @data. In any case, you want to use while instead of foreach.
      • You are overwriting the file in place, without creating a backup. What if the machine crashes in the middle of the "print FILE @data"? You have lost your data.
      I personally like the technique of writing the results to a new temporary file, and then renaming the original to a backup name and the temporary file to the original. Something like this:

      open FILE, "/path/to/file" or die "$!\n"; open OUT, "/tmp/tmpfile.$$" or die "$!\n"; while(<FILE>) { next if /^\s*$/; print OUT, $_; } close FILE; close OUT; rename("/path/to/file", "/path/to/file.bak") or die "Error in rename: $!\n"; rename("/tmp/tmpfile.$$", "/path/to/file") or die "Error in rename: $!";
      Side note: using "/tmp/tmpfile.$$" as a temporary file name could have security implications if the program is running set-uid. For better ways of creating a temporary file name, see the FAQ How do I make a temporary file name?

      Also, a regex is not necessary if you are looking for strictly empty lines. But many times, a line is considered empty even if it contains white space, in this case using regular expressions is the best way to do it.

      --ZZamboni

        ? ? ? ?

        How did you come up with those reasons.

        It is true that if you open a file for writing and the system crashes, the file is destroyed. But the file is opened in read/write. When the system crashes the file will be unchanged unless the file is closed properly. There may be other reasons to make a backup, you reason is not one of them.

        As the foreach vs while, on my solaris there is no difference in memory usage between the two. I ran the two programs with a 40MB file and the memory usage was the same.

        I can understand the file write vs read/write mistake, but next time could you please check you facts before you post.

Re: How do I remove blank lines from text files?
by chromatic (Archbishop) on Jul 10, 2000 at 23:09 UTC
    Don't forget that our friend next will work in a while loop. Combining some of these answers gets you the technique I prefer:
    while (<FILE>) { next if /^#/; # skip lines starting with comments next unless /\S/; # do something with a real line here }
    This allows you to use a normal loop construct (makes sense to me) and short-circuit it at the beginning if certain conditions are met. It makes cleaner code than a lot of if-elsif statements.
RE: How do I remove blank lines from text files?
by Russ (Deacon) on Jul 10, 2000 at 23:16 UTC
    Update: as davorg correctly points out, I had a strange and (mostly) unnecessary construct. This is my updated version.

    As an explanation: I had the <IN> in a while loop, to take advantage of the diamond operator's auto-assign-to-$_ feature. This code just makes the assignment explicit.

    Thanks, davorg!

    This will write a new file with blank lines (where there are two or more newlines together) removed:

    local $/ = undef; open IN, "<junk" or die "Couldn't open junk: $!"; open OUT, ">fixed" or die "Couldn't open fixed: $!"; ($_ = <IN>) =~ s/\n{2,}/\n/g; print OUT; close IN; close OUT;
    Variations on this theme (like whitespace, as noted elsewhere in this thread) would be similar...

    Russ

      As you've undefined $/ and are therefore acting in 'slurp' mode, you'll read the whole file in with your first call to <IN>. Doesn't that make your while loop unecessary?

      --
      <http://www.dave.org.uk>

      European Perl Conference - Sept 22/24 2000, ICA, London
      <http://www.yapc.org/Europe/>
      Just to nitpick a tiny bit, I want to warn about your modification to $/.

      By doing things the way to did, you are setting yourself up for crazy-ass bugs. You properly localize the variable, but the scope of the localization is way too broad.

      I suggest getting in the habit of anonymous blocks every time you localize a global variable. Like This:
      open OUT, ">fixed" or die "Couldn't open fixed: $!"; open IN, "<junk" or die "Couldn't open junk: $!"; { local $/ = undef; ($_ = <IN>) =~ s/\n{2,}/\n/g; print OUT; } close IN; close OUT;
      This way you properly limit the scope of local $/ to just the lines you need it. And wont accidentilly screw yourself up later.
        Good point. I should have included that in my snippet.

        Russ
        Brainbench 'Most Valuable Professional' for Perl

(jeffa) Re: How do I remove blank lines from text files?
by jeffa (Chancellor) on Jul 11, 2000 at 01:42 UTC
    I realize that I might be chastised for this, but
    I feel the need to mention that this is a trivial
    task for vi.

    Type : and then issue this command

    %g/^$/d
    The percent sign tells vi 'do this for all lines in the
    file.' The g operator means 'globally', and the
    d operator means 'perform deletion on the lines
    that match'.
      You got a ++ from me -- vi rules! :)
      Of course, if you're going to bother to start up vi, you might as well just do perl -pi -e 's/^\n$//' file.txt
      But I think this exceeds the scope of the original poster's question, since his program is already parsing the file.
RE: How do I remove blank lines from text files?
by alawishus (Initiate) on Jul 11, 2000 at 20:46 UTC
    open (NEWFILE,">newfile") or die print "yikes: $!; open (FILE,"<myfile") or die print "yikes: my file $!\n"; while (<FILE>) { if ($_ =~ /\w/) { print NEWFILE; } close FILE; close NEWFILE;
RE: How do I remove blank lines from text files?
by Nimster (Sexton) on Jul 11, 2000 at 20:49 UTC
    m/^$/ : ^ matches something in the beggining of the line. $, in the end. Thus its the beggining of the line immidiately followed by the end of it -Nimster Freelance writer for GameSpy industries LTD. (www.gamespy.com)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://21826]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (20)
As of 2014-07-11 17:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (232 votes), past polls