http://www.perlmonks.org?node_id=1061063

wegelin has asked for the wisdom of the Perl Monks concerning the following question:

I work at the unix command line (bash shell) on a Mac 10.6.8. In a text file, I want to replace all multiple newlines, even if the newlines have other whitespace between them, with EEEEE. All single newlines I want to leave alone. Thus the following file, called textfile:
dogs rats
cats

fish
must be transformed into
dogs rats
catsEEEEEfish
But as you see from the example below, the following regular expression, issued at the command line, doesn't do it.
s/\n\s*\n/EEEEE/g
Here is the example:
> cat textfile
dogs rats
cats

fish
> perl -p -e 's/\n\s*\n/EEEEE/g' textfile
dogs rats
cats

fish

Is there a simple or elegant solution?

Replies are listed 'Best First'.
Re: regular expression: match multiple newlines
by Cristoforo (Curate) on Nov 03, 2013 at 19:42 UTC
    Your code is reading 1 line at a time. You need to 'slurp' the file and apply the regular expression. perl -0 -p -e 's/\n\s*\n/EEEEE/g' textfile

      Good point! However, \n\s*\n would replace \n\x20\n --a space surrounded by two newlines--which is not two consecutive newlines, thus the prior \n{2,} suggestion.

      Edit: My apologies. Didn't notice the the OP's mentioning the possibility of other whitespace between the newlines.

        He stated that there might be spaces between the newlines :-)
Re: regular expression: match multiple newlines
by Lennotoecom (Pilgrim) on Nov 04, 2013 at 01:22 UTC
    /(?<=\S)\s*$/ ? ({print "$a$`"},$a="\n") : ($a='EEEEE') while <DATA>; __DATA__ dogs rats cats fish text cats2 fish2 text3
    output:
    dogs rats catsEEEEEfish text cats2EEEEEfish2 text3
Re: regular expression: match multiple newlines
by Kenosis (Priest) on Nov 03, 2013 at 19:38 UTC

    Try:

    s/\n{2,}/EEEEE/g

    Output on your dataset:

    dogs rats catsEEEEEfish

    The \n{2,} notation matches 2+ newlines.

    Edit: My apologies. Didn't notice the the OP's mentioning the possibility of other whitespace between the newlines.