in reply to Tokenizing and qr// <=> /g interplay
The /g modifier is not part of a regex, it's part of the pattern matching operation. The only flags that affect a regex are /i, /m, /s, and /x. In addition, because the contents of qr// are interpolated, the /o flag can be used with the qr// operator.
A compiled regex (qr//) does not take the place of the pattern matching operation. It merely holds the regex.
If you want to know when qr// is useful, that's another question that I'll answer (if you ask).
_____________________________________________________
Jeff japhy Pinyan,
P.L., P.M., P.O.D, X.S.:
Perl,
regex,
and perl
hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re^2: Tokenizing and qr// <=> /g interplay
by skyknight (Hermit) on Apr 23, 2005 at 15:42 UTC
|
OK, consider it asked. I presume that the most useful thing about qr// is that it allows you to pass regular expressions around as arguments to functions and such, but honestly my lack of experience with it leaves me wondering. | [reply] |
|
When you have variables in a regex, Perl examines the contents of those variables to see if the overall representation of the regex has changed:
for ("ab", "cd") {
if ($str =~ /$_/) { ... }
}
In the above code, the regex 'ab' is compiled and executed, and then the regex 'cd' is compiled and executed. Compare that with:
($x, $y) = ("ab", "c");
for (1, 2) {
if ($str =~ /$x$y/) { ... }
($x, $y) = ("a", "bc");
}
Here, even though $x and $y change, the ACTUAL regex ('abc') does not change, so the regex is compiled only once. The process that Perl does internally is this:
- take regex at this opcode
- interpolate any variables
- compare with previous value of the regex at this opcode
- compile if different
- execute this regex
When you use the /o modifier, it tells Perl that after it has compiled the regex, it should SKIP steps 2-4 of this process, meaning that the regex at this opcode will NEVER change.
So what is qr// good for? Consider this:
my @strings = make_10_strings();
for (@strings) {
for my $p ('x+', 'yz?y', 'xz+y') {
if ($_ =~ $p) { handle($_) }
}
}
This code compiles a grand total of 30 regexes. Why? Because for each string in @strings we've got three patterns to execute, and because each time the $_ =~ $p is encountered the contents of $p has changed, the regex is compared and recompiled each time. Now sure, you could reverse the order of the loops, but that will result in the calls to handle() happening in a different order.
So enter the qr//.
When Perl sees a regex comprised solely of a single variable, Perl checks to see if that variable is a Regexp object (what qr// returns). If it is, Perl knows that the regex has already been compiled, so it simply uses the compiled regex in the place of the regex. That means doing:
my @strings = make_10_strings();
for (@strings) {
for my $p (qr/x+/, qr/yz?y/, qr/xz+y/) {
if ($_ =~ $p) { handle($_) }
}
}
is considerably faster. There is no additional compilation happening. It's probably even better to move the qr// values into an array, but that might be moot since they're made of constant strings in this example. The point is, the use of qr// in a looping construct is the primary benefit it offers. Yes, it helps break a regex up into pieces too, but that's just a matter of convenience.
Be warned that the benefit of qr// objects is lost if there is additional text in the pattern match. I mean that $foo =~ /^$rx_obj$/ suffers from the same problem as $foo =~ /^$text$/.
_____________________________________________________
Jeff japhy Pinyan,
P.L., P.M., P.O.D, X.S.:
Perl,
regex,
and perl
hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
| [reply] [d/l] [select] |
|
my $true = qr/y(?:es|up)?|1|enabled?/i;
my $false = qr/n(?:o(?:pe)?)?|0|disabled?/i;
die "Need boolean input"
unless /^(?:$true|$false)$/;
if (/$true/)
{
do_stuff();
}
you're saying that in the die's unless clause, it will need to completely recompile the regex? That is not my interpretation, but I could be completely wrong here.
My assumption is that both $true and $false are compiled once, and only once, and the unless modifier above would not need to recompile either one.
Even if that is the case, I use code like the above because I like to be able to reuse a common criteria for truth and falseness across many expressions - sometimes, as in the die statement above, for validation that the value is something (i.e., not a typo - if someone had "y]", we'd not accidentally treat that as a false value, we'd simply reject it so the user could fix the typo), or, at other times, such as the if statement above, just to see which one it was. Which goes to the OP's question on why it's useful, somewhat in agreement with other posts here. I'm just showing a concrete example of real, live, production code where I use this construct. | [reply] [d/l] |
|
|
Hi, I was just refactoring some code and saw a possible opportunity to use this advice. But. Instead of having multiple strings in @strings to process, I leave all the lines from my file joined together as one giant string with embedded \n chars. From this angle, since I only have to use each regex once across all the strings via them being 'joined' into one string, I won't benefit from qr.
my ( $crummy, $good );
foreach my $crummy_good_ar ( @corrections_to_make ) {
( $crummy, $good ) = @$crummy_good_ar;
$file_in_string_form =~ s/\b(\Q$crummy\E)\b/$good/ig;
}
However, as you can see (?) from the example above, I have lots of crummy/good switchouts to do, and is my plodding approach above the best that can be expected?
P.S. Can you clarify/update what you meant by:
the benefit of qr// objects is lost if there is additional text in the pattern match
I think you are saying that a precompiled/qr regex used in a follow-on regex will have to be recompiled if you snap additional text on to the qr'd variable, because the overall text of the new regex will be different. Although at least one would still have the benefit of 'concentrated regex logic' within the qr'd variable? | [reply] [d/l] |
|
|
|
| [reply] [d/l] [select] |
|
|