Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Character Class Abbreviations

by root (Monk)
on Nov 12, 1999 at 01:29 UTC ( [id://968]=perltutorial: print w/replies, xml ) Need Help??

Character class abbreviations allow you to match any of a set of characters without too much hassle. One way to do this is to put the set of characters you want to match from within []. For instance [0123456789] would allow you to match any of those numbers. This can be kind of cumbersome. You can also negate a character class by placing a caret at the front of it. For instance [^0123456789] matches anything that is not a number. You shouldn't be surprised that Perl makes your life much easier by defining some character class a bbreviations. These are alphanumeric characters preceded by a backslash. Perl allows you to match any number with a \d in your regular expression.

Now for a quick word about metacharacters. Metacharacters are characters that have special meaning within regular expressions. Therefore if you put them into a regular expression they won't match literally. Unless you precede the metacharacter with a \. The metacharacters are \|()$^.?* Now for a quick word about each of them do before we return to character class abbreviations.
Metacharacter(s)Meaning
.Matches any character besides newline
()Used for grouping characters
[]Used for defining character classes
|Used for or in regular expression
\Denotes the beginning of a character class abbreviation, or for the following metacharacter to be matched literally
*Quantifier matches 0 or more of the previous character or group of characters
?Makes a quantifier nongreedy
^Matches the beginning of a string (or line if /m is used)
$Matches the end of a string (or line if /m is used)


Now lets define some character classes

Character ClassMeaning
\ddigit or [0123456789]
\Dnondigit or [^0123456789]
\wword (alphanumeric) or [a-zA-Z_0-9]
\Wnonword
\bword boundary
\swhitespace character [ \t\r\n\f]
\Snon whitespace character


That's a lot of information to get a handle on. So lets check out pattern-matching examples

Replies are listed 'Best First'.
Isn't '+' a metacharacter too?
by Anonymous Monk on Jun 17, 2002 at 17:40 UTC
    Similar to '*' only it matches 1 or more of the previous character.
Re: Character Class Abbreviations
by Terminal (Initiate) on Dec 23, 2005 at 21:29 UTC
    Perhaps you could show some examples of these? I'm a bit confused...
    I tried
    if ($_ =~ [[en]]) { print "yes\n";} else { print "no\n";}
    Didn't work :( Always printed no, even when I had "en" in the document :(

    Code tags added by Arunbear

      In replying to a 6-yr old node, your question is in danger of flying beneath everyone's radar. Better to post under Seekers of Perl Wisdom. To get an idea of how this site works, I recommend looking at the "Welcome to the Monastery" section of the Tutorials page.

      You might wish to check out some of the references listed here: Re: regexp: extracting info

      In your example, it sounds to me more like you are trying to match the literal string "en". But let's assume for a moment you really want a character class...

      One of the tools I didn't mention in the writeup above is the simple "patten test" program from Learning Perl, 3rd Ed. I often use this when first constructing a regex because it's simple, easy to edit, and I get immediate feedback. So let's start with that program, modified slightly to match the character class [en] .

      # From: Schwartz & Phoenix: Learning Perl, 3rd Ed (The Llama), pp. 103 use strict; while (<>) { chomp; if (/[en]/) { print "Matched: |$`<$&>$'|\n"; } else { print "No match.\n"; } }

      Let's assume that this is our "test" file:

      English French Spanish German Aramaic Arabic

      Where and how the character class matches may surprise you:

      Matched: |E<n>glish| Matched: |Fr<e>nch| Matched: |Spa<n>ish| Matched: |G<e>rman| No match. No match. No match.

      If you were expecting output more like the following, matching the string "en":

      No match. Matched: |Fr<en>ch| No match. No match. No match. No match. No match.

      You will need to change the program as follows:

      # From: Schwartz & Phoenix: Learning Perl, 3rd Ed (The Llama), pp. 103 use strict; while (<>) { chomp; if (/en/) { print "Matched: |$`<$&>$'|\n"; } else { print "No match.\n"; } }

      HTH,

      planetscape
Re: Character Class Abbreviations
by theantler (Sexton) on Mar 18, 2010 at 09:17 UTC
    I think this tutorial/intro of yorus is really good and helpful. You say: The metacharacters are \|()$^.?* But [] are also metachars since they dont match litterally but have special meaning with the regex, so shoouldnt they be in that list too? It would be nice if you list all the metachars. Thanks - ta

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perltutorial [id://968]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-03-19 04:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found