Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Removing multiple newlines from a line using regex

by fbicknel (Beadle)
on Jul 18, 2011 at 22:38 UTC ( #915297=CUFP: print w/replies, xml ) Need Help??

I sought, but did not find; so I came up with my own solution. Say you have a $tring that holds a multi-line (yet short) config file. You may have obtained such a string with:
my $foo = do { local $/; <$fh> };
for example.

Enyway, you want to join some lines that are delimited by some pair of characters, such as (parens). Thus a line looking thus:
baz = (one two, three four, five six)
baz = (one two, three four, five six)
Why? You may want to gather all the baz info in a separate r.e. session following this one. I'm not going to talk about that step here, just the step that gathers the multiple lines into a single line.

Here's my solution. I offer it in case it's a good one: [edit] - removed deprecated \1 notation in lieu of $1 (but see additional suggestion without loop from jwkrahn, below).
while ($foo =~ s/(\([^)]*)\n(.*?\))/$1$2/gs) {}

The while loop capitalizes on the fact that the s/// inside will continue to return a non-zero result until it can find no more lines to join. The r.e. looks for an open paren, \(, followed by anything but a close paren, [^)], and zero or more of these, * . If found, that part becomes group 1 ($1 later). That must be followed by a newline, \n, then any characters preceding a closing paren, \), which becomes group 2 ($2 later). If we find all that, substitute it with group 1 and group 2 without the intervening newline ($1$2).

The modifiers /sg tell Perl to look across newlines (/s) and to do this as many times as is found in the string (/g). Note that it will only happen once per find, per time through the loop. Thus, if you have several formations with the form above (see baz example), it will remove one newline from each of those several formations in the string. But it only removes one newline from each formation with each iteration through the while loop.

I hope this will be helpful to someone in the future. If you know a better way, feel free to chime in.

humbly submitted,

Replies are listed 'Best First'.
Re: Removing multiple newlines from a line using regex
by jwkrahn (Monsignor) on Jul 19, 2011 at 01:55 UTC
    while ($foo =~ s/(\([^)]*)\n(.*?\))/\1\2/gs) {}

    The /\1\2/ in the double quoted string should be /$1$2/.    Using \1 and \2 in double quoted strings is deprecated.

    The usual form for that is:

    1 while $foo =~ s/(\([^)]*)\n(.*?\))/$1$2/gs;

    And you could do that without a while loop:

    $foo =~ s{ \( ( [^()]* ) \) }{ ( my $x = $1 ) =~ tr/\n//d; $x }gex;


      Maybe I kindof understand the problem wrongly, but wouldn't it be more simple to use two substitution regexes like this:

      $foo =~ s/\n//gs && $foo =~ s/\)/)\n/g;
Re: Removing multiple newlines from a line using regex
by ikegami (Pope) on Jul 18, 2011 at 23:17 UTC

    Text::Balanced can be of use.

    It would make multiple delimiters and the following easy to handle.

    baz = ((one two, three four), five six)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://915297]
Approved by ikegami
Front-paged by Arunbear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2018-05-20 12:51 GMT
Find Nodes?
    Voting Booth?