Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Why does this non-greedy match not work?

by Special_K (Scribe)
on Jun 29, 2020 at 17:07 UTC ( #11118667=perlquestion: print w/replies, xml ) Need Help??

Special_K has asked for the wisdom of the Perl Monks concerning the following question:

Suppose I have the following in a text file "fbb_test":

/foo/bar/baz

Now suppose I have the following script:

#!/tool/bin/perl -w use strict; my $file = "/tmp/fbb_test"; open(FILE, $file) || die("ERROR: Unable to open $file for read, exitin +g...\n"); while (<FILE>) { chomp($_); if ($_ =~ /\/(.+?)$/) { printf("captured $1\n"); } } close(FILE);

The script is returning "captured foo/bar/baz". Given that I specified the non-greedy operator (?), I would have expected the result to be "baz", as the non-greedy operator would have matched as few characters as possible between a forward slash and the end of the line. What am I missing here?

Replies are listed 'Best First'.
Re: Why does this non-greedy match not work?
by Fletch (Chancellor) on Jun 29, 2020 at 17:41 UTC

    Making the .+ non-greedy doesn't change that the dot is perfectly willing to match any / in the inputs (since a dot just matches any (non-newline, by default) character). It's still going to start at the left-most "/" that it sees (the first character) and then start gobbling up one or more things-which-match-dot (which includes all the subsequent letters and "/"s) until it hits the end of the line. If you'd explicitly asked for things-which-are-not-"/" using [^/]+ instead you'd have had more luck (e.g. m{/ ([^/]+) $}x).

    Of course this could just be a slight XY problem and you really want File::Basename or Path::Tiny instead . . .

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      I agree entirely, and just wanted to add: a common misconception (one that I was guilty of myself sometimes, before being enlightened about it) is that a regex that doesn't start with ^ but ends with $ somehow changes the behavior of the regex engine to start looking at the end of the string - this isn't how it works, the regex engine always matches from left to right and stops at the first match it finds. A regex of "/foo/bar/baz" =~ m{/[^/]+$} will still cause the regex engine to attempt to match /foo and /bar before settling on /baz - you can see this in action at this link (JavaScript and a modern browser required) when you click the use re "debug"; button.

      Update: Corrected = to =~, thanks hippo!

Re: Why does this non-greedy match not work?
by roho (Chancellor) on Jun 30, 2020 at 10:22 UTC
    Just a few code nits. Use lexical file handles and the three-argument form of open. The "||" operator in the open statement works because of the presence of parens, but the "or" operator is safer because it has a lower precedence, in case parens are not present.

    "It's not how hard you work, it's how much you get done."

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11118667]
Approved by toolic
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2020-07-06 09:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?