Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Tokenizing and qr// <=> /g interplay

by japhy (Canon)
on Apr 23, 2005 at 15:21 UTC ( [id://450722]=note: print w/replies, xml ) Need Help??


in reply to Tokenizing and qr// <=> /g interplay

The /g modifier is not part of a regex, it's part of the pattern matching operation. The only flags that affect a regex are /i, /m, /s, and /x. In addition, because the contents of qr// are interpolated, the /o flag can be used with the qr// operator.

A compiled regex (qr//) does not take the place of the pattern matching operation. It merely holds the regex.

If you want to know when qr// is useful, that's another question that I'll answer (if you ask).

_____________________________________________________
Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
  • Comment on Re: Tokenizing and qr// <=> /g interplay

Replies are listed 'Best First'.
Re^2: Tokenizing and qr// <=> /g interplay
by skyknight (Hermit) on Apr 23, 2005 at 15:42 UTC
    OK, consider it asked. I presume that the most useful thing about qr// is that it allows you to pass regular expressions around as arguments to functions and such, but honestly my lack of experience with it leaves me wondering.
      When you have variables in a regex, Perl examines the contents of those variables to see if the overall representation of the regex has changed:
      for ("ab", "cd") { if ($str =~ /$_/) { ... } }
      In the above code, the regex 'ab' is compiled and executed, and then the regex 'cd' is compiled and executed. Compare that with:
      ($x, $y) = ("ab", "c"); for (1, 2) { if ($str =~ /$x$y/) { ... } ($x, $y) = ("a", "bc"); }
      Here, even though $x and $y change, the ACTUAL regex ('abc') does not change, so the regex is compiled only once. The process that Perl does internally is this:
      1. take regex at this opcode
      2. interpolate any variables
      3. compare with previous value of the regex at this opcode
      4. compile if different
      5. execute this regex
      When you use the /o modifier, it tells Perl that after it has compiled the regex, it should SKIP steps 2-4 of this process, meaning that the regex at this opcode will NEVER change.

      So what is qr// good for? Consider this:

      my @strings = make_10_strings(); for (@strings) { for my $p ('x+', 'yz?y', 'xz+y') { if ($_ =~ $p) { handle($_) } } }
      This code compiles a grand total of 30 regexes. Why? Because for each string in @strings we've got three patterns to execute, and because each time the $_ =~ $p is encountered the contents of $p has changed, the regex is compared and recompiled each time. Now sure, you could reverse the order of the loops, but that will result in the calls to handle() happening in a different order.

      So enter the qr//.

      When Perl sees a regex comprised solely of a single variable, Perl checks to see if that variable is a Regexp object (what qr// returns). If it is, Perl knows that the regex has already been compiled, so it simply uses the compiled regex in the place of the regex. That means doing:

      my @strings = make_10_strings(); for (@strings) { for my $p (qr/x+/, qr/yz?y/, qr/xz+y/) { if ($_ =~ $p) { handle($_) } } }
      is considerably faster. There is no additional compilation happening. It's probably even better to move the qr// values into an array, but that might be moot since they're made of constant strings in this example. The point is, the use of qr// in a looping construct is the primary benefit it offers. Yes, it helps break a regex up into pieces too, but that's just a matter of convenience.

      Be warned that the benefit of qr// objects is lost if there is additional text in the pattern match. I mean that $foo =~ /^$rx_obj$/ suffers from the same problem as $foo =~ /^$text$/.

      _____________________________________________________
      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
        When Perl sees a regex comprised solely of a single variable, Perl checks to see if that variable is a Regexp object (what qr// returns). If it is, Perl knows that the regex has already been compiled, so it simply uses the compiled regex in the place of the regex.

        Really? That is very interesting. So if I have:

        my $true = qr/y(?:es|up)?|1|enabled?/i; my $false = qr/n(?:o(?:pe)?)?|0|disabled?/i; die "Need boolean input" unless /^(?:$true|$false)$/; if (/$true/) { do_stuff(); }
        you're saying that in the die's unless clause, it will need to completely recompile the regex? That is not my interpretation, but I could be completely wrong here.

        My assumption is that both $true and $false are compiled once, and only once, and the unless modifier above would not need to recompile either one.

        Even if that is the case, I use code like the above because I like to be able to reuse a common criteria for truth and falseness across many expressions - sometimes, as in the die statement above, for validation that the value is something (i.e., not a typo - if someone had "y]", we'd not accidentally treat that as a false value, we'd simply reject it so the user could fix the typo), or, at other times, such as the if statement above, just to see which one it was. Which goes to the OP's question on why it's useful, somewhat in agreement with other posts here. I'm just showing a concrete example of real, live, production code where I use this construct.

        Hi, I was just refactoring some code and saw a possible opportunity to use this advice. But. Instead of having multiple strings in @strings to process, I leave all the lines from my file joined together as one giant string with embedded \n chars. From this angle, since I only have to use each regex once across all the strings via them being 'joined' into one string, I won't benefit from qr.

        my ( $crummy, $good ); foreach my $crummy_good_ar ( @corrections_to_make ) { ( $crummy, $good ) = @$crummy_good_ar; $file_in_string_form =~ s/\b(\Q$crummy\E)\b/$good/ig; }
        However, as you can see (?) from the example above, I have lots of crummy/good switchouts to do, and is my plodding approach above the best that can be expected?

        P.S. Can you clarify/update what you meant by:
        the benefit of qr// objects is lost if there is additional text in the pattern match

        I think you are saying that a precompiled/qr regex used in a follow-on regex will have to be recompiled if you snap additional text on to the qr'd variable, because the overall text of the new regex will be different. Although at least one would still have the benefit of 'concentrated regex logic' within the qr'd variable?

      Another use for qr// is to break up unmanageably complex regular expressions into simpler, named, self-contained pieces. (There's a direct parallel here with subs, which do the same for 'ordinary' Perl code. In fact, you can consider a named regex to be just a function written with a funny-looking syntax: its input is a string and its output is either a Boolean value or one or more strings, depending on whether it captures anything.)

      Here's an example from a code-filtering assertions module (yes, another one) that's not yet tested thoroughly enough to submit to CPAN:

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://450722]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-06-20 01:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.