Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Unexpected qr// behavior

by pbeckingham (Parson)
on Apr 20, 2004 at 14:43 UTC ( #346651=perlquestion: print w/replies, xml ) Need Help??
pbeckingham has asked for the wisdom of the Perl Monks concerning the following question:

I am searching for multi-line entities in a string, and while I now have working code, I see a quirk in the use of qr// that I did not expect. Here is the example - I have changed the code to look for HTML comments, which illustrates the quirk.

#! /usr/bin/perl -w use strict; my $html = qq{ blah <!-- comment --> blah <!-- comment --> blah }; print "m: $_\n" for $html =~ /(<!--.*?-->)/sg; my $r = qr/<!--.*?-->/; print "qr: $_\n" for $html =~ /($r)/sg; $r = qr/<!--.*?-->/s; print "qrs: $_\n" for $html =~ /($r)/g;
The output is:
m: <!-- comment --> m: <!-- comment --> qr: <!-- comment --> qrs: <!-- comment --> qrs: <!-- comment -->
The quirk is that the s modifier to the regex must be applied to the qr// construct, and may not be applied to the matching later. The g modifier, and the capturing parens may be added, but not the s.

Do I have this right? Is there sense, order and logic behind not being able to override these qr// modifiers later?

Replies are listed 'Best First'.
Re: Unexpected qr// behavior
by Paladin (Priest) on Apr 20, 2004 at 14:53 UTC
    [~]$ perl -le '$re = qr/<!--.*?-->/; print $re;' (?-xism:<!--.*?-->)
    As you can see, the qr// specificly turns off all options for the regex that you don't specify, which is why adding the /s afterwards doesn't make any difference.

    As for whether this is a good thing or not, I can see arguments either way.

    • For: When you created the RE, you didn't say you wanted that option, so Perl makes sure that this particular part of the RE doesn't have that option. This is probably more usful when building a large RE up from smaller parts, and you want the different parts to act differently with regards to the various options.
    • Against: Like you saw, adding the /s didn't do what you meant. And Perl, after all, is usually very DWIMy.
      To make things even more clear (I hope), I'll just add that you can pass modifiers to qr, just like you can for normal regexes, like this:
      $re = qr/<!--.*?-->/s; print $re;
      which prints
      That is: /s enabled, and /x, /i and /m disabled.
Re: Unexpected qr// behavior
by diotalevi (Canon) on Apr 20, 2004 at 15:41 UTC

    Other people said this but in more jargon than was necessary. qr// is a request to compile a regular expression. /s is a modifier to an expression that is to be compiled. Unless you tell the compiler the /s when the expression is being compiled then it doesn't have an effect.

    It might be worth having perl throw away the previous compilation work if the current match's flags don't match the expression's flags. It'd then behave somewhat like you expected (except that then you're compiling more often than you thought).

      IIRC, Ilya Z has said that qr// compiling immediately instead of at first use is an implementation detail. According to that view, qr// is a request to define a resuable regex, not to compile one.

      And the key part of reusability is having it maintain it's own flags; Regexp::Common would break seriously if the external flags leaked in.

        I suppose but the implementation has three cases (in order of best to worst case): re-used as a complete regex, not used at all, re-used as a component of a regex. If that's the motivation I would have thought it would be more important to swap the last two cases so that re-using it as a component isn't worse than not using it at all.

        I'm thinking of /o is dead, long live qr//! for the basis of the sort.

        1. That qr// is compiled once and then used
        2. the qr// is compiled once and then not used
        3. the qr// is compiled twice and used once

Re: Unexpected qr// behavior
by rnahi (Curate) on Apr 20, 2004 at 16:14 UTC
Re: Unexpected qr// behavior
by matija (Priest) on Apr 20, 2004 at 14:55 UTC
    Yes, there is sense in that.

    The regexp works by constructing a finite state engine through which you feed characters to see if it will stop in one of the final states (CS terminology, here. Sorry I can't use simpler terms).

    The transitions in this state machine are not altered by the captures. They are not altered by the g modifier - since that modifier merely states that the machine will be rerun until it no longer matches.

    The s modifier, however alters the way the state machine behaves - "." now matches the newline. Any transitions in the state engine which involve matching a "." are altered.

    The i affects the state machine pretty radically, too. I bet you can't add it later, just like you can't add the g.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://346651]
Approved by Thelonius
Front-paged by halley
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2018-05-21 06:45 GMT
Find Nodes?
    Voting Booth?