Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

what means this regex? $x = qr/[0-9a-f]{4|8}/

by kidongrok (Acolyte)
on Jun 09, 2006 at 08:16 UTC ( #554443=perlquestion: print w/replies, xml ) Need Help??

kidongrok has asked for the wisdom of the Perl Monks concerning the following question:

DB<1> $a="abcdabcd" DB<2> p $a =~ /[0-9a-f]{4|8}/ DB<3> p $a =~ /[0-9a-f]{4}/ 1 DB<4> p $a =~ /[0-9a-f]{8}/ 1

NB-the display is eating the [] square brackets up there.

lots can follow from this:
- why no error on the 1st line
- am I really the 1st ? ;-)
- whats it mean currently ? (this I can answer)

1: BRANCH(15) 2: ANYOF[0-9a-f](13) 13: EXACT <{4>(18) 15: BRANCH(18) 16: EXACT <8}>(18) 18: END(0)

thats not what I expected, wouldnt you tacitly expect perl to do something similar to what it does with {4,8} there ? or is that ambiguous ?

given the availability of \, which is needed to get literal reading of []() chars, whats the harm ?

(He baits the hook, chum the water... Dont we have some regex-monsters lurking here ?

Janitored by Corion: Added formatting, code tags, as per Writeup Formatting Tips

Replies are listed 'Best First'.
Re: what means this regex? $x = qr/[0-9a-f]{4|8}/
by reasonablekeith (Deacon) on Jun 09, 2006 at 08:38 UTC
    Regarding
    {4|8}
    this is a nice idea, but doesn't do anything special. a curly bracket is only a special character when it is found in one of these forms {n}, {n,} or {n,m}.

    As your example isn't like this, the bracket is just matched as a plain character. What your first regex shows is just an alternation, equivalent to the following...

    if (/[0-9a-f]\{4/ or /8\}/} { print "matched\n"; }
    Note that I've escaped the curly bracket just to be explict, it's not actually necessary
    ---
    my name's not Keith, and I'm not reasonable.
Re: what means this regex? $x = qr/[0-9a-f]{4|8}/
by Zaxo (Archbishop) on Jun 09, 2006 at 08:30 UTC

    Did you expect to get bitwise-or evaluated to match twelve characters? Or did you expect regex alternation to get it to match exactly four or exactly eight?

    I think the regex engine gave up on compiling a quantifier when it hit incompatible syntax. That would leave the braces as literal.

    After Compline,
    Zaxo

      That was a poorly worded Q you responded to. To be clear, it was a muse, rather than write "4 or 8 hex chars" other longer ways. You saw a 2nd interpretation that I had dismissed:
    • Doing bitwize-or: {4|8} = {12}; gives only 1 match-length, which is already doable '{12}'
    • Doing match-length-alternation, insofar as it allows multiple lengths to be given, seems more useful.

        If alternation worked in quantifiers, you'd want to put the eight first. The regex engine may be greedy, but it's also hasty. As soon as it matches an alternate it forgets about the remaining ones. Anything that would match the eight has already matched the four.

        Translating to the intended regex,

        $_ = "abc" x 4; $re_short = qr/([0-9a-f]{4}|[0-9a-f]{8})/; $re_long = qr/([0-9a-f]{8}|[0-9a-f]{4})/; print $1, $/ if /$re_short/; print $1, $/ if /$re_long/; __END__ abca abcabcab

        After Compline,
        Zaxo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://554443]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (1)
As of 2021-10-16 19:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My first memorable Perl project was:







    Results (69 votes). Check out past polls.

    Notices?