This is an archived low-energy page for bots and other anonmyous visitors.
Please sign up if you are a human and want to interact.
loris has asked for the wisdom of the Perl Monks concerning the following question:
Hello all,
I wanted to remove the final section of an XPath string using a regex. I
obviously don't understand non-greediness, because I though this would work:
my $xpath = 'BoringNode[1]/InterestingNode[@InterestingAttribute="outg
+rabe"]/AnotherBoringNode[5]';
$xpath =~ s|\/(.*?)$||;
print $xpath . "\n";
I had hoped to get
BoringNode[1]/InterestingNode[@InterestingAttribute="outgrabe"]
Can anyone help?
Thanks,
loris
"It took Loris ten minutes to eat a satsuma . . . twenty minutes to get from one end of his branch to the other . . . and an hour to scratch his bottom. But Slow Loris didn't care. He had a secret . . ."
Re: Regex: Matching last of repeated character to end of string
by jbware (Chaplain) on Nov 04, 2005 at 07:30 UTC
|
What happens with your regex is it finds the first "/" it can, and then the ".*?" matches any character until the end. It sounds like you might want something more like this:
$xpathOther =~ s|/[^/]*$||;
This ensures that it matches on the last "/" it can find, and blanks it and everything after it out.
-jbWare
| [reply] [d/l] |
Re: Regex: Matching last of repeated character to end of string
by davis (Vicar) on Nov 04, 2005 at 07:32 UTC
|
I wouldn't use greediness at all. I'd use "Not this character" matches:
use warnings;
use strict;
my $xpath = 'BoringNode[1]/InterestingNode[@InterestingAttribute="outg
+rabe"]/AnotherBoringNode[5]';
$xpath =~ s|\/[^/]+$||;
print $xpath . "\n";
That's "any character that isn't a slash"
davis
Kids, you tried your hardest, and you failed miserably. The lesson is: Never try.
| [reply] [d/l] |
Re: Regex: Matching last of repeated character to end of string
by Perl Mouse (Chaplain) on Nov 04, 2005 at 07:53 UTC
|
Greediness or non-greediness is only one rule that Perl applies, and it's not the most important rule. The most important rules are (in that order):
- Find a match.
- Of all the possible matches, find one that starts the left most in the query string.
- If a regex can make choices (alternation, repetition) in multiple places (for instance two alternations, two repetitions, or an alternation and a repetition), choices on the left are more significant than choices on the right. That is, it will try all possibilities of the choice on the right before trying the second alternative of the choice on the left.
- In alternation, choices on the left are tried before choices on the right.
- In repetion, greedy choices are preferred over non-greedy once - except when there is a ? modifier, then non-greedy choices are preferred over greedy ones.
Your regex could match in two places: starting from the first slash, and starting from the second (last) slash. Starting at the first slash obeys rule 2. Starting at the second slash obeys rule 5. Rule 2 wins.
| [reply] |
|
|
| [reply] |
Re: Regex: Matching last of repeated character to end of string
by Aristotle (Chancellor) on Nov 04, 2005 at 08:52 UTC
|
Pattern matches always find the leftmost possible match. Greediness does not change this, it only affects how soon a quantifier will consider itself satisfied. In your case, / finds the leftmost slash in the string, .*? then tries to match nothing, which fails because the following $ needs the end of string to succeed, and then successively keeps trying to match only one more character, which keeps failing because the $ does not succeed, until the .*? has matched all of the string after the first slash. Since all parts of the pattern then succeed, you have a match.
You can use this knowledge to combine the leftmost match behaviour with greediness to make them work for you. Try this:
m{ .* / (.*) $ }msx
What happens here is that the first greedy .* will match the entire string. But then the / wants a slash, and that fails, since there’s no slash after the end of the string; so the .* is forced to concede one character. The / will keep failing and the .* will keep conceding characters, until it is matching only up to the character before the last slash in the string, at which point the / can match that slash and thus succeed. The parenthesised .* will then swallow the rest of the string, which means the $ immediately succeeds.
Now the parenthesised quantifier has matched exactly what you wanted.
Makeshifts last the longest. | [reply] [d/l] [select] |
Re: Regex: Matching last of repeated character to end of string
by radiantmatrix (Parson) on Nov 04, 2005 at 09:57 UTC
|
There are three approaches I can think of that would solve your issue, and only one is a regex.
- The regex solution:
$xpath =~ s{/[^/]+$}{};
- The split solution:
my @temp = split('/', $xpath);
$xpath = join('/',@temp[0..$#temp-1]);
- The rindex solution:
$xpath = substr($xpath,0,rindex($xpath,'/'));
The last one (rindex) strikes me as the most efficent way to go, but I haven't Benchmarked it.
| [reply] [d/l] [select] |
|
|
Or even substr($xpath, rindex($xpath, '/')) = "";!
| [reply] [d/l] |
|
|