Keep It Simple, Stupid PerlMonks

### Oddness with regex quantifiers

by talexb (Canon)
 on Nov 23, 2010 at 21:03 UTC Need Help??
talexb has asked for the wisdom of the Perl Monks concerning the following question:

I just got caught trying to use a regex quantifier {,5}, which I expected to get at most 5 elements. It didn't work.

Puzzled, I turned to my dog-eared Camel, turn flipped to p.176 and indeed found that the following cases were documented:

• {MIN,MAX} / at least MIN times, at most MAX times
• {MIN,} / at least MIN times
• {COUNT} / COUNT times
Missing was my situation,
• {,MAX} / at most MAX times
Can anyone tell me why this is missing?

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Replies are listed 'Best First'.
Re: Oddness with regex quantifiers
by moritz (Cardinal) on Nov 23, 2010 at 21:29 UTC
If you want at most MAX repetitions, use {0,MAX}

The reason for the asymmetry between {MIN,} and {,MAX} is probably that there's a zero in perl, but Inf isn't generally supported.

If you wonder what {,MAX} matches, here's the answer:

```\$ perl -Mre=debug -ce ' /a{,5}/'
Compiling REx "a{,5}"
Final program:
1: EXACT <a{,5}> (4)
4: END (0)
anchored "a{,5}" at 0 (checking anchored isall) minlen 5
-e syntax OK
Freeing REx: "a{,5}"

\$ perl -wE 'say "yes" if "a{,4}" =~ /a{,4}/'
yes

# in contrast:
\$ perl -Mre=debug -ce ' /a{0,5}/'
Compiling REx "a{0,5}"
Final program:
1: CURLY {0,5} (5)
3:   EXACT <a> (0)
5: END (0)
minlen 0
-e syntax OK
Freeing REx: "a{0,5}"

The reason for the asymmetry between {MIN,} and {,MAX} is probably that there's a zero in perl, but Inf isn't generally supported.
The issue was raised not so long ago on p5p, it even caught old farts off guard. I don't think anyone recalled the reason why it does what it does. And people expressed the wish it would have been done otherwise in the past. Some have suggested a warning, but IIRC, nothing happened. It's unlikely the actual meaning is going to change. The advantages of not having to type a 0 don't out-weight the negative impact of potentially breaking code. It's one of the many things that with the benefit of hindsight would have been done differently.
The issue was raised not so long ago on p5p, it even caught old farts off guard.

Yeah .. caught this old fart off guard too. Glad to hear I wasn't the only one surprised. :)

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

The reason for the asymmetry between {MIN,} and {,MAX} is probably that there's a zero in perl, but Inf isn't generally supported.

Hmm .. but isn't the zero implied by the null field?

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Hmm .. but isn't the zero implied by the null field?

Not if you expect .{MIN,} to match at least MIN characters. .{MIN,0} doesn't make much sense to me.

Hmm .. but isn't the zero implied by the null field?

Not really, I don't think.
If perl accepted {,X}, I'm pretty sure some people would assume that it stood for {1,X}, not {0,X}. After all, one to X matches makes more sense than zero to X matches in many situations.
It's propably best to force people to make an explicit choice.
```\$ perl -E'say "aa" =~ /a{1,}/'
1

\$ perl -E'say "aa" =~ /a{1,0}/'
Can't do {n,m} with n > m in regex; marked by <-- HERE in m/a{1,0} <--
+ HERE / at -e line 1.
Re: Oddness with regex quantifiers
by fisher (Priest) on Nov 23, 2010 at 21:28 UTC
Just use {0,5} quantifier =)
Re: Oddness with regex quantifiers
by BrowserUk (Pope) on Nov 23, 2010 at 22:06 UTC

I would think that a reason, is that there is no reasonable default.

Just as we have * for zero or more and + for 1 or more; either 0 or 1 would be equally valid defaults for {,n}. Since there is no hamming difference between them, how would you pick one over the other to be the default?

Good point. I even suggested 0 earlier in this thread, but now I think 1 would be a better choice. I guess I'm spoiled by how well Perl deals with default values; I though it would DWIM in this case.

And I was wrong. I'm now chastened, older and wider wiser; and I'm using {1,5} in my regex.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Thank you for giving us the benefits of your experience.

-t.b.b.
Re: Oddness with regex quantifiers
by ikegami (Pope) on Nov 23, 2010 at 21:30 UTC

It's not missing: {0,MAX}

(It took forever to post!)

Re: Oddness with regex quantifiers
by cdarke (Prior) on Nov 25, 2010 at 13:34 UTC
This is not specific to Perl. In the POSIX standard for Extended Regular Expressions (The Open Group Base Specifications Issue 7):

...an interval expression of the format "{m}" , "{m,}" , or "{m,n}"

And yes, Perl supports many other extensions that are not in POSIX.

Too bad .. I just thought for the sake of orthogonality that {,m] should have been present, with an implied 1 for the missing value. But it sounds like there are equally good arguments for having the implied value be 0 .. so perhaps the decision was made not to implement that grammar at all.

Lesson learned .. thanks for the udpate.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Create A New User
Node Status?
node history
Node Type: perlquestion [id://873287]
Approved by moritz
Front-paged by moritz
help
Chatterbox?
and snow settles gently...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2017-11-19 12:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
In order to be able to say "I know Perl", you must have:

Results (280 votes). Check out past polls.

Notices?