http://www.perlmonks.org?node_id=313257

aloomens has asked for the wisdom of the Perl Monks concerning the following question:

I am a perl beginner and am trying to write a script to remove single blank lines from text files, but leave any two or more consecutive blank lines alone. What I have done is shown below, but that will remove 1 blank line from any one or more consequtive blank lines (not quite what I need):
#!/usr/bin/perl while (<>) { print unless ($_ eq "\n") && ($_ ne $last); $last = $_; }

Replies are listed 'Best First'.
Re: remove single blank lines?
by Coruscate (Sexton) on Dec 08, 2003 at 22:09 UTC

    I bid you welcome to the wonderful world of regular expressions (aka regexes). More information can be learned about these from perlre. Here's my version:

    my $string = q{ this is a sample text. all single blank lines will be removed. all double, triple, or more blank lines will not be touched. }; $string =~ s#\n(\n+)#"\n" x (length($1)>1 ? length($1)+1 : length($1))#eg; print $string; __END__ this is a sample text. all single blank lines will be removed. all double, triple, or more blank lines will not be touched.

      That may work, but that regex hurts my head. The only way I would use that code is if it were commented well as to what each part of the regex does - even then I might not use it.

      A much clearer, albeit more code and more steps method, would be to:
      1. read from the input file, and 2. write only lines that you want to write to a temporary output file, and 3. close the input file 4. rename the temporary output file to the input file's name.
      This method would allow you to read ahead one (or more than one) line to see if there are 2 more more blank lines together.

      Just my preference, but IMHO much clearer than trying to decifer that regex. My choice is to always make my code as understandable to *everyone* (including newbies) as possible. Obfuscated regex's do more to scare people away from Perl. Just my 2c.

      HTH.
Re: remove single blank lines?
by BrowserUk (Patriarch) on Dec 08, 2003 at 23:33 UTC

    As a one-liner

    perl -lne"length or ++$x and next; $x==1 or printf $/ x $x; $x=0; prin +t"

    Caveat: Different quotes for different folks:)


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

Re: remove single blank lines?
by asarih (Hermit) on Dec 08, 2003 at 22:32 UTC
    If the file you want to examine does not fit in memory, you will have to remember two lines at a time.
    while (<>) { print unless ($_ eq "\n") && ($last ne "\n") && ($secondlast ne "\n" +); $secondlast = $last; $last = $_; }
    update: This doesn't quite work. We'll have to deter deciding whether or not we will print the current line until we have examined the next line.
    while (<>) { $last = $this; $this = $next; $next = $_; print $this unless ($last ne "\n") && ($this eq "\n") && ($next ne " +\n"); } print $next unless ($next eq "\n") && ($this ne "\n"); #last line is s +pecial
Re: remove single blank lines?
by Roy Johnson (Monsignor) on Dec 08, 2003 at 22:44 UTC
    my $blank_lines = 0; while (<>) { if ($_ eq "\n") { ++$blank_lines } else { print "\n" x $blank_lines if $blank_lines != 1; print; $blank_lines = 0; } }

    The PerlMonk tr/// Advocate
Re: remove single blank lines?
by Anonymous Monk on Dec 08, 2003 at 23:29 UTC
    perl -p -0 -e 's/(?<!\n)\n\n(?!\n)/\n/g;' singleblank.txt

    See perldoc perlrun for the flags, and perldoc perlre for the regex.

    Here is the longer version:

    #!/usr/bin/perl -w use strict; # set input record separator $/ (see perldoc perlvar) to undef # to "slurp" the file my $t = do {local $/; (<DATA>)}; # (?<!\n) is negative lookbehind, no \n before # (?!=\n) in negative lookahead, no \n after # in between is \n\n - two consecutive newlines eq a single blank line $t =~ s/(?<!\n)\n{2}(?!=\n)/\n/g; print $t; __DATA__ none one two three one none three two none one

    which gives

    none one two three one none three two none one

      oops, forgot to login.

      qq

      Update: as ysth shows below, my code performs poorly. Why the forward and negative lookarounds aren't working as I expected them to defeats me right now. This regex works much better:

      s/(^|[^\n])\n{2}([^\n]|$)/$1\n$2/g;

Re: remove single blank lines?
by ysth (Canon) on Dec 09, 2003 at 02:13 UTC
    For fun, I decided not to try to answer the question, but to test the existing answers. Some had problems with what I consider reasonable definitions of blank lines at the beginning and ending; if your definition was different, you may see a failure reported for what you consider a success. (If answers were updated recently, I won't have the updated version, and I may have made transcription errors in putting the code in my testbed. Apologies for any errors.)

    Update: /msg me if you want your code updated; BrowserUk, sorry about the "line1\nline2" test, which completely defeats the -l switch; regard it as a case of GIGO rather than a failure if you want.

    Here is the test program:

    and here is the output: Some of the tests issued warnings (which I have omitted); I mean to fix the test program to check for those and report them if I have time.
Re: remove single blank lines?
by Roger (Parson) on Dec 09, 2003 at 00:24 UTC
    Here's my implementation -
    $text =~ s/(?<=\b)\n(?=\n\b)//g;
    Consider the string below -
    This is line1\n \n line3\n \n \n
    What the regex will do is to get rid of the '\n' at the end of line1, which is followed by another \n and a word boundary.

    This solution is similar to qq's answer above, but with subtle differences. qq's solution will replace two consecutive \n's with a single \n, while my solution will get rid of the first \n.

Re: remove single blank lines?
by graff (Chancellor) on Dec 09, 2003 at 05:18 UTC
    No one else brought this up, so... How confident are you that all of your blank lines will consist of just "\n"?

    If an input file has been created manually, the odds are close to 50-50 that at least some "blank" lines contain spaces and/or tabs as well as the terminating newline (LF, CR or CRLF, depending on your OS). In this case, you may be better off using one of the line-oriented approaches (rather than a "slurp" approach), and using the regex conditional (/^\s*$/) instead of ($_ eq "\n")

Re: remove single blank lines?
by aloomens (Initiate) on Dec 09, 2003 at 18:44 UTC
    What I've settle on is shown below. It won't remove a single blank line at the beginning of the file, but I don't need that. It may not be the most efficient way to do this, but it make sense to me. I learned a bunch. Thanks!
    #!/usr/bin/perl while (<>) { $last = $this; $this = $next; $next = $_; $lastblank = ($last =~ (/^\s*$/)); $thisblank = ($this =~ (/^\s*$/)); $nextblank = ($next =~ (/^\s*$/)); print $this unless (! $lastblank) && ($thisblank) && (! $nextblank); } print $next unless ($nextblank) && (! $thisblank);