http://www.perlmonks.org?node_id=1018133

jbryan has asked for the wisdom of the Perl Monks concerning the following question:

Regex gurus: How do I NOT match something inside something else? E.g. I want to match X but not if its in braces {X}?

Example:

I want to replace all newlines (\n) with (\n<br>), but not if they are inside double-braces. For example:

my $data = q| foo bar {{ alpha beta }} baz |;

In $data, above, the lines with 'foo' and 'bar' should have the <br> added, but the lines with 'alpha' and 'beta' should not be touched.

Am I just missing something simple in perlre that would make this work? I don't even know where to start - other than some non-regex solution entirely.

Replies are listed 'Best First'.
Re: Replace newlines only if not inside braces
by LanX (Saint) on Feb 11, 2013 at 13:25 UTC
    As long as the braces groups are not nested you could do it by separating blocks in a split and handling them differently.
    use Data::Dump; my $data = q| foo bar {{ alpha beta }} baz |; @splits = split /({{.*?}})/s, $data; dd \@splits; my $result=""; while (my $block = shift @splits) { $block =~ s/\n/<br>\n/gs; $result .= $block; $result .= shift @splits if @splits; } print $result;

    output
    ["\nfoo\nbar\n", "{{\nalpha\nbeta\n}}", "\nbaz\n"] <br> foo<br> bar<br> {{ alpha beta }}<br> baz<br>

    I refrain from trying a complicated and potentially unmaintainable one-line regex solution.

    Some come to mind¹, but I don't see the necessity if there are no other restrictions (like lack of memory) involved.

    Cheers Rolf

    UPDATE
    ¹) like
    • looping with while (/({{.*?}})/gs) (and \G and pos)
    • using /e to do substitution within substitutions
    • complicated look-ahead and look-behind assertion
    • using \K somehow to restrict replacement
Re: Replace newlines only if not inside braces
by smls (Friar) on Feb 11, 2013 at 13:54 UTC

    Replacing a target pattern everywhere except inside specific chunks, can be achieved with a regex of the following form:

    s/((?:CHUNK_TO_BE_EXCLUDED|.)*?)TARGET/$1REPLACEMENT/gs

    In your example, TARGET would be \n and REPLACEMENT would be \n<br>. The CHUNK_TO_BE_EXCLUDED pattern would have to match a whole block wrapped in double braces. You can use {{.*?}}\n, unless brackets can be nested and you need to guarantee that you match properly balanced pairs, in which case you can find a howto for constructing the pattern you need in perlfaq6.

      Works! =)

      But it took me some time to understand why all edgecases are covered, though I'm pretty sure that I already saw this technique before.

      Cheers Rolf

Re: Replace newlines only if not inside braces
by tmharish (Friar) on Feb 11, 2013 at 13:37 UTC

    And if they are nested see this thread.

    And specifically this excellent post by 7stud

Re: Replace newlines only if not inside braces
by ww (Archbishop) on Feb 11, 2013 at 14:01 UTC

    As an example, not as a direct response on your specific goal, one way would be to use a negated character class:

    C:\>perl -E "use 5.016; use strict; use warnings; my $str='abXcd{X}efXyz'; my @matches; while ($str =~ /[^{](X)[^}]/g) {push @matches, $1;} say @matches;" XX
Re: Replace newlines only if not inside braces
by trizen (Hermit) on Feb 11, 2013 at 14:01 UTC
    You can match and discard something that you don't want to replace. For example, match the group {{...}} and use the \K to replace only the right side, keeping the left side of \K as it is.

    Code:
    $data =~ s<(?:{{.*?}}\K)?\n>{<br>\n}gs; print $data;
      Works as long as }} is immediately followed by a \newline! ¹

      Is there a reason why you've put the \K within the group?

      I suppose this does the same and is better readable!

      $data =~ s<(?:{{.*?}})?\K\n>{<br>\n}gs; print $data;

      Cheers Rolf

      1) see or clause in Re: Replace newlines only if not inside braces for a work around

        No, there is no difference. I just thought it would be a little bit more efficient. I thought that, if the \K is outside, the $& variable is being cleaned up for every substitution, which is not really necessary. It should be cleaned only when something on the left side has been matched. Anyway, it is more readable in your way, and does, basically, the same thing. :)

        Alternatively, to work with strings that contain {{...}} groups, which are not followed by a newline, this code should do it:
        $data =~ s<(?:{{.*?}}|[^\n])*\K\n>{<br>\n}gs; print $data;

        Works

        Not quite.
      Did that s/// win an obfuscation contest somewhere?
      use warnings; use strict; use 5.012; my $data = <<'END_OF_TEXT'; foo bar {{ alpha beta }} baz END_OF_TEXT $data =~ s/ (?: #Non-capturing group {{.*?}} #Text enclosed by double braces \K #Exclude what's to the left of \K from match )? #Match whole group 0 or 1 time \n /\n<br>/gxms; say $data; --output:-- foo <br>bar <br>{{ alpha beta }} <br>baz <br>

      It would make more sense to put the newlines after the breaks if you were trying to pretty print some html.

        Did that s/// win an obfuscation contest somewhere?
        so what's your contribution?
        It would make more sense to put the newlines after the breaks if you were trying to pretty print some html.
        minor problems of minor minds...