Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: Tokenizing and qr// <=> /g interplay

by japhy (Canon)
on Apr 23, 2005 at 17:09 UTC ( [id://450738]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Tokenizing and qr// <=> /g interplay
in thread Tokenizing and qr// <=> /g interplay

When you have variables in a regex, Perl examines the contents of those variables to see if the overall representation of the regex has changed:
for ("ab", "cd") { if ($str =~ /$_/) { ... } }
In the above code, the regex 'ab' is compiled and executed, and then the regex 'cd' is compiled and executed. Compare that with:
($x, $y) = ("ab", "c"); for (1, 2) { if ($str =~ /$x$y/) { ... } ($x, $y) = ("a", "bc"); }
Here, even though $x and $y change, the ACTUAL regex ('abc') does not change, so the regex is compiled only once. The process that Perl does internally is this:
  1. take regex at this opcode
  2. interpolate any variables
  3. compare with previous value of the regex at this opcode
  4. compile if different
  5. execute this regex
When you use the /o modifier, it tells Perl that after it has compiled the regex, it should SKIP steps 2-4 of this process, meaning that the regex at this opcode will NEVER change.

So what is qr// good for? Consider this:

my @strings = make_10_strings(); for (@strings) { for my $p ('x+', 'yz?y', 'xz+y') { if ($_ =~ $p) { handle($_) } } }
This code compiles a grand total of 30 regexes. Why? Because for each string in @strings we've got three patterns to execute, and because each time the $_ =~ $p is encountered the contents of $p has changed, the regex is compared and recompiled each time. Now sure, you could reverse the order of the loops, but that will result in the calls to handle() happening in a different order.

So enter the qr//.

When Perl sees a regex comprised solely of a single variable, Perl checks to see if that variable is a Regexp object (what qr// returns). If it is, Perl knows that the regex has already been compiled, so it simply uses the compiled regex in the place of the regex. That means doing:

my @strings = make_10_strings(); for (@strings) { for my $p (qr/x+/, qr/yz?y/, qr/xz+y/) { if ($_ =~ $p) { handle($_) } } }
is considerably faster. There is no additional compilation happening. It's probably even better to move the qr// values into an array, but that might be moot since they're made of constant strings in this example. The point is, the use of qr// in a looping construct is the primary benefit it offers. Yes, it helps break a regex up into pieces too, but that's just a matter of convenience.

Be warned that the benefit of qr// objects is lost if there is additional text in the pattern match. I mean that $foo =~ /^$rx_obj$/ suffers from the same problem as $foo =~ /^$text$/.

_____________________________________________________
Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

Replies are listed 'Best First'.
Re^4: Tokenizing and qr// <=> /g interplay
by Tanktalus (Canon) on Apr 23, 2005 at 19:22 UTC
    When Perl sees a regex comprised solely of a single variable, Perl checks to see if that variable is a Regexp object (what qr// returns). If it is, Perl knows that the regex has already been compiled, so it simply uses the compiled regex in the place of the regex.

    Really? That is very interesting. So if I have:

    my $true = qr/y(?:es|up)?|1|enabled?/i; my $false = qr/n(?:o(?:pe)?)?|0|disabled?/i; die "Need boolean input" unless /^(?:$true|$false)$/; if (/$true/) { do_stuff(); }
    you're saying that in the die's unless clause, it will need to completely recompile the regex? That is not my interpretation, but I could be completely wrong here.

    My assumption is that both $true and $false are compiled once, and only once, and the unless modifier above would not need to recompile either one.

    Even if that is the case, I use code like the above because I like to be able to reuse a common criteria for truth and falseness across many expressions - sometimes, as in the die statement above, for validation that the value is something (i.e., not a typo - if someone had "y]", we'd not accidentally treat that as a false value, we'd simply reject it so the user could fix the typo), or, at other times, such as the if statement above, just to see which one it was. Which goes to the OP's question on why it's useful, somewhat in agreement with other posts here. I'm just showing a concrete example of real, live, production code where I use this construct.

      It won't need to recompile the entire regex each time, because $true and $false haven't changed, but Perl will need to compare the string representation of the regex with the representation of it the last time it was at this opcode. And the first time through, yes, it will compile the regex.
      _____________________________________________________
      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re^4: Tokenizing and qr// <=> /g interplay
by ff (Hermit) on Apr 25, 2005 at 11:59 UTC
    Hi, I was just refactoring some code and saw a possible opportunity to use this advice. But. Instead of having multiple strings in @strings to process, I leave all the lines from my file joined together as one giant string with embedded \n chars. From this angle, since I only have to use each regex once across all the strings via them being 'joined' into one string, I won't benefit from qr.

    my ( $crummy, $good ); foreach my $crummy_good_ar ( @corrections_to_make ) { ( $crummy, $good ) = @$crummy_good_ar; $file_in_string_form =~ s/\b(\Q$crummy\E)\b/$good/ig; }
    However, as you can see (?) from the example above, I have lots of crummy/good switchouts to do, and is my plodding approach above the best that can be expected?

    P.S. Can you clarify/update what you meant by:
    the benefit of qr// objects is lost if there is additional text in the pattern match

    I think you are saying that a precompiled/qr regex used in a follow-on regex will have to be recompiled if you snap additional text on to the qr'd variable, because the overall text of the new regex will be different. Although at least one would still have the benefit of 'concentrated regex logic' within the qr'd variable?

      I can answers your first question by answering your second one. If you made all the first elements of your array references Regexp objects:
      $_->[0] = qr/\b(\Q$_->[0]\E)b/ for @corrections_to_make;
      then you could do your loop as
      for my $crummy_good_ar (@corrections_to_make) { my ($crummy, $good) = @$crummy_good_ar; $file_in_string_form =~ s/$crummy/$good/ig; }
      This way, even if you end up looping over THAT code, you'd still be dealing with already-compiled regexes. As soon as you put additional text into a regex with qr// in it:
      my $rx = qr/abc/; if ($str =~ /^($rx)$/) { ... }
      Perl has to do the "compare physical regex forms" test. Only if the qr// object is all alone will it have the entire benefits it was made for.

      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
        Okay, so an array of [ qr, $good ] combos could help me if my example was something like:

        for my $file_in_string_form ( @all_files ) { for my $crummy_good_ar (@corrections_to_make) { my ($crummy, $good) = @$crummy_good_ar; $file_in_string_form =~ s/$crummy/$good/ig; } }
        And in the meantime, if I have no additional files to process, I might as well compile the regexen as needed?

        Thanks! I think I understand now. :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://450738]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-06-20 03:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.