Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Regex: Matching last of repeated character to end of string

by loris (Hermit)
on Nov 04, 2005 at 12:16 UTC ( #505682=perlquestion: print w/replies, xml ) Need Help??

loris has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I wanted to remove the final section of an XPath string using a regex. I obviously don't understand non-greediness, because I though this would work:

my $xpath = 'BoringNode[1]/InterestingNode[@InterestingAttribute="outg +rabe"]/AnotherBoringNode[5]'; $xpath =~ s|\/(.*?)$||; print $xpath . "\n";

I had hoped to get


Can anyone help?



"It took Loris ten minutes to eat a satsuma . . . twenty minutes to get from one end of his branch to the other . . . and an hour to scratch his bottom. But Slow Loris didn't care. He had a secret . . ."

Replies are listed 'Best First'.
Re: Regex: Matching last of repeated character to end of string
by jbware (Chaplain) on Nov 04, 2005 at 12:30 UTC
    What happens with your regex is it finds the first "/" it can, and then the ".*?" matches any character until the end. It sounds like you might want something more like this:
    $xpathOther =~ s|/[^/]*$||;
    This ensures that it matches on the last "/" it can find, and blanks it and everything after it out.

Re: Regex: Matching last of repeated character to end of string
by davis (Vicar) on Nov 04, 2005 at 12:32 UTC
    I wouldn't use greediness at all. I'd use "Not this character" matches:
    use warnings; use strict; my $xpath = 'BoringNode[1]/InterestingNode[@InterestingAttribute="outg +rabe"]/AnotherBoringNode[5]'; $xpath =~ s|\/[^/]+$||; print $xpath . "\n";
    That's "any character that isn't a slash"

    Kids, you tried your hardest, and you failed miserably. The lesson is: Never try.
Re: Regex: Matching last of repeated character to end of string
by Perl Mouse (Chaplain) on Nov 04, 2005 at 12:53 UTC
    Greediness or non-greediness is only one rule that Perl applies, and it's not the most important rule. The most important rules are (in that order):
    1. Find a match.
    2. Of all the possible matches, find one that starts the left most in the query string.
    3. If a regex can make choices (alternation, repetition) in multiple places (for instance two alternations, two repetitions, or an alternation and a repetition), choices on the left are more significant than choices on the right. That is, it will try all possibilities of the choice on the right before trying the second alternative of the choice on the left.
    4. In alternation, choices on the left are tried before choices on the right.
    5. In repetion, greedy choices are preferred over non-greedy once - except when there is a ? modifier, then non-greedy choices are preferred over greedy ones.
    Your regex could match in two places: starting from the first slash, and starting from the second (last) slash. Starting at the first slash obeys rule 2. Starting at the second slash obeys rule 5. Rule 2 wins.
    Perl --((8:>*

      While it doesn't really matter here, rule 4 has the same importance as rule 5.

Re: Regex: Matching last of repeated character to end of string
by Aristotle (Chancellor) on Nov 04, 2005 at 13:52 UTC

    Pattern matches always find the leftmost possible match. Greediness does not change this, it only affects how soon a quantifier will consider itself satisfied. In your case, / finds the leftmost slash in the string, .*? then tries to match nothing, which fails because the following $ needs the end of string to succeed, and then successively keeps trying to match only one more character, which keeps failing because the $ does not succeed, until the .*? has matched all of the string after the first slash. Since all parts of the pattern then succeed, you have a match.

    You can use this knowledge to combine the leftmost match behaviour with greediness to make them work for you. Try this:

    m{ .* / (.*) $ }msx

    What happens here is that the first greedy .* will match the entire string. But then the / wants a slash, and that fails, since there’s no slash after the end of the string; so the .* is forced to concede one character. The / will keep failing and the .* will keep conceding characters, until it is matching only up to the character before the last slash in the string, at which point the / can match that slash and thus succeed. The parenthesised .* will then swallow the rest of the string, which means the $ immediately succeeds.

    Now the parenthesised quantifier has matched exactly what you wanted.

    Makeshifts last the longest.

Re: Regex: Matching last of repeated character to end of string
by radiantmatrix (Parson) on Nov 04, 2005 at 14:57 UTC

    There are three approaches I can think of that would solve your issue, and only one is a regex.

    1. The regex solution:
      $xpath =~ s{/[^/]+$}{};
    2. The split solution:
      my @temp = split('/', $xpath); $xpath = join('/',@temp[0..$#temp-1]);
    3. The rindex solution:
      $xpath = substr($xpath,0,rindex($xpath,'/'));

    The last one (rindex) strikes me as the most efficent way to go, but I haven't Benchmarked it.

    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    "In any sufficiently large group of people, most are idiots" - Kaa's Law
      Or even substr($xpath, rindex($xpath, '/')) = "";!

      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://505682]
Approved by gube
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2020-09-24 14:29 GMT
Find Nodes?
    Voting Booth?
    If at first I donít succeed, I Ö

    Results (133 votes). Check out past polls.