Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Regex bug? matching multiple newline with /m

by NERDVANA (Deacon)
on Aug 09, 2021 at 08:15 UTC ( [id://11135722]=perlquestion: print w/replies, xml ) Need Help??

NERDVANA has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I ran across a strange behavior and I'm wondering if it is a bug, or some obscure documented behavior I should keep in mind.

These match:

# (not using 'say' because I wanted to try it on old perl versions) perl -e 'printf "%s\n", "ab\nab\n" =~ /^ab$ab$/m' perl -e 'printf "%s\n", "ab\nab\n" =~ /^(ab$ab$)/m' perl -e 'printf "%s\n", "ab\nab\n" =~ /^(ab$ab)$/m' perl -e 'printf "%s\n", "ab\nab\n" =~ /^(ab)$ab$/m' perl -e 'printf "%s\n", "ab\nab\n" =~ /^(a)b$ab$/m'

(with '$' consuming a "\n" because it is /m mode)

These do not match:

perl -e 'printf "%s\n", "ab\nab\n" =~ /^(ab$a)b$/m' perl -e 'printf "%s\n", "ab\nab\n" =~ /^(ab$)ab$/m' perl -e 'printf "%s\n", "ab\nab\n" =~ /^ab$(ab)$/m' perl -e 'printf "%s\n", "ab\nab\n" =~ /^(ab$){2}/m' perl -e 'printf "%s\n", "ab\nab\n" =~ /(ab$){2}/m' perl -e 'printf "%s\n", "ab\nab\n" =~ /(?^:^(?^m:(ab$){2}))/m'

This is greedy but only matches one line

perl -e 'printf "%s\n", "ab\nab\n" =~ /^((?:ab$)+)/m'

and it seems to me that all of them should match. This appears to happen as old as 5.8.9 and as new as 5.32.1

Any insights?

Replies are listed 'Best First'.
Re: Regex bug? matching multiple newline with /m
by tybalt89 (Monsignor) on Aug 09, 2021 at 08:31 UTC

    $ is a zero-width match, it does not consume a \n;

    [rick@ry ~]$ perl -e 'use strict; printf "%s\n", "ab\nab\n" =~ /^ab$ab +$/m' Global symbol "$ab" requires explicit package name (did you forget to +declare "my $ab"?) at -e line 1. Execution of -e aborted due to compilation errors. [rick@ry ~]$

    Try adding the 'use strict;' to your other examples to see what they are really doing.

      Ha, yes bitten by interpolation. Thanks.

      Now I'm wondering how many times in the past I've had difficulties with a multiline regex and it was due to the incorrect assumption that it would consume the \n...

        ... bitten by interpolation.

        Perhaps better to say "bitten by failure to enable strict" (and I would also recommend warnings).


        Give a man a fish:  <%-{-{-{-<

        The problem with interpolation is covered in the documentation for  /PATTERN/msixpodualngc in Regexp Quote Like Operators.
        If "'" is used as the delimiter, no variable interpolation is done.

        This is not enough to solve your problem with the newline because $ is a zero width assertion. Assertions

        Besides "^" and "$", Perl defines the following zero-width assertions:

        In this case, it is probably best to match the newline explicitly and not use the anchor at all.

        Bill
Re: Regex bug? matching multiple newline with /m
by haukex (Archbishop) on Aug 09, 2021 at 18:58 UTC
Re: Regex bug? matching multiple newline with /m (rxrxplainer)
by Anonymous Monk on Aug 09, 2021 at 09:07 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11135722]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-03-29 00:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found