Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Order of terms in square braces

by throop (Chaplain)
on Feb 03, 2007 at 00:40 UTC ( [id://598033]=perlquestion: print w/replies, xml ) Need Help??

throop has asked for the wisdom of the Perl Monks concerning the following question:

Brethern,

I'm running Perl 5.8.6. I'm trying to pattern match some comma-delimited phrases in text. The phrases can have multiple words, plus the following punct:
  ( ) < > -

I'm using square braces in a regex, and seeing the following behavior that I don't understand (running in the interpreter within the debugger):

DB<116> x 'cold, too cold' =~ /( [\(\)-<>\w]+ )/x 0 'cold,' DB<117> x 'cold, too cold' =~ /( [-<>\w\(\)]+ )/x 0 'cold'
Why am I matching the trailing comma in the first case but not the second? Is the matching within square braces supposed to be order-dependent? None of the refs I've looked at mention it.

throop

Replies are listed 'Best First'.
Re: Order of terms in square braces
by imp (Priest) on Feb 03, 2007 at 00:44 UTC
    Because dashes in character classes are used to express ranges, like [a-z], except when they are the first item. Because of this you should always either put them as the first item, or escape them with \
      first or last element in the class.

        or following a POSIX class
        or preceeding a POSIX class
        or following ^ if the ^ is the first character
        etc

        Basically, anywhere it doesn't make sense to treat it as a range seperator.

        Escaping - (like the OP needlessly did for ( and )) will force it to match literally.

      d'Oh.

      I should have known that. Thanks.

      throop

      The people are bright and certain, Where I am dim and confused; The people are clever and wise, Where I am dull and ignorant,
Re: Order of terms in square braces
by BrowserUk (Patriarch) on Feb 03, 2007 at 00:51 UTC

    In the first case, this part of your character class \)-< is a character range. Ie. It is equivalent to all the characters between ')' (ascii 41) and '<' (ascii 60), which includes ascii 44, ','.

    In the second case, by placing the '-' as the first character inside the brackets, you've disabled the range function and it is treated as a normal character,


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://598033]
Approved by shonorio
Front-paged by shonorio
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (4)
As of 2024-04-24 02:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found