Those greedy quantifiers!

by Petras (Friar)
on Apr 01, 2010 at 00:59 UTC
Petras has asked for the wisdom of the Perl Monks concerning the following question:


Trying to generate a list of all extensions in a directory, sans filenames. So, it seems
should work, but it matches
So, I tried moving the quantifier inside the parents:
to the same results. No one ever promised regexes would be easy... What did I miss?

Re: Those greedy quantifiers!
by jwkrahn (Monsignor) on Apr 01, 2010 at 01:36 UTC
    What did I miss?

    The \. at the beginning of the pattern will match the first period that satisfies the pattern, it doesn't matter if the .* is greedy or not.    You probably want /\.([^.]*)\z/ instead.

Re: Those greedy quantifiers!
by ikegami (Pope) on Apr 01, 2010 at 01:46 UTC

    ? is a quantifier in your first snippet (quantified atom matches optionally matches once) because it's not preceded by another quantifier. (And it happens to be useless in your pattern.)

    ? is a greediness modifier (makes it non-greedy) in your second snippet because it's preceded by a quantifier.

    There's no reason to alter greediness here. Solution:

    my %seen; my @exts = grep !$seen{$_}++, map /\.([^.]+)\z/, @file_names;
Re: Those greedy quantifiers!
by Anonymous Monk on Apr 01, 2010 at 01:13 UTC
    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( '\.([^.]+)$' )->explain; __END__ The regular expression: (?-imsx:\.([^.]+)$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [^.]+ any character except: '.' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

