Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Why is this regex greedy?

by pbeckingham (Parson)
on Jul 02, 2004 at 20:16 UTC ( #371497=perlquestion: print w/replies, xml ) Need Help??
pbeckingham has asked for the wisdom of the Perl Monks concerning the following question:

Given a string that looks suspiciously like a path name:

yet is not a path name, I am trying to extract the abc000000 identifier from the end. My regex doesn't work, and while I know how to write a better one that works, I don't know specifically why this one does not:
my $s = '/a/b/c/d/e/abc00000'; my ($id) = $s =~ m{/(.+?\d+)$}; print $id, "\n"; __OUTPUT__ a/b/c/d/e/abc00000
I can fix it easily by using:
my ($id) = $s =~ m{/([^/]+\d+)$};
Which I understand. It's just that I don't understand why the first version is greedy.

Replies are listed 'Best First'.
Re: Why is this regex greedy?
by chromatic (Archbishop) on Jul 02, 2004 at 20:32 UTC

    The regex engine prefers leftmost, longest matches. Nothing in your regex prevents it from matching everything between the first slash and the end of line.

      So it is not treating that .+? the way I expected? Even though there are more-minimal matches?

      I guess leftmost-longest trumps non-greedy.

        Right and right.

        I don't understand why people expect non-greedy matching to actually mean "globally shortest match". Perhaps it's in the language we use. Just keep the left-to-rightness as the most prominent feature of your mental model of how perl's RE engine works and you shouldn't go wrong though.

Re: Why is this regex greedy?
by ercparker (Hermit) on Jul 02, 2004 at 20:28 UTC
    your regex is matching:
    / followed by one or more any character up to the first digit til end of line

    updated: removed regex example sice you just want it explained

      Thanks, but I have no shortage of correctly functioning regexes - I want to know precisely why the one listed doesn't work. chromatic knows.

        It's simple, the  .+ is capturing everything.
Re: Why is this regex greedy?
by dpavlin (Friar) on Jul 02, 2004 at 23:32 UTC
    If you are running perl 5.6 or newer (and you are, right?) you might be able to insert use re 'debug'; which will give you detailed output from regex engine. That's a good way to debug your regular expressions.

Re: Why is this regex greedy?
by kscaldef (Pilgrim) on Jul 03, 2004 at 00:15 UTC

    Short answer, the regex engine works left to right.

    Long answer, go read Friedl's articles in The Perl Journal or his book on Mastering Regular Expressions.

Re: Why is this regex greedy?
by heroin_bob (Sexton) on Jul 03, 2004 at 01:51 UTC
    Here's a good tutorial by chromatic that might be of help, if you haven't already checked it out.
Re: Why is this regex greedy?
by japhy (Canon) on Jul 03, 2004 at 02:50 UTC
    I'd suggest you use m{.*/(.*)} to get whatever is after the last '/' (assuming there are no newlines in your data). Or perhaps a module like File::Basename
    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
Re: Why is this regex greedy?
by Stevie-O (Friar) on Jul 04, 2004 at 17:04 UTC
    Using a regular expression along isn't necessarily what you want.

    Remember: Perl is more than just regular expressions :)

    $str = '/a/b/c/d/e/abc00000'; $end = (split '/', $str)[-1]; # pull the last element split() returns print "we wanted '$end'";
    $"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://371497]
Approved by Old_Gray_Bear
Front-paged by grinder
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2017-10-19 17:00 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (255 votes). Check out past polls.