Character class abbreviations allow you to match any of a set of characters without too much hassle.
One way to do this is to put the set of characters you want to match from within [].
For instance [0123456789] would allow you to match any of those numbers. This can be
kind of cumbersome. You can also negate a character class by placing a caret at the front of it. For
instance [^0123456789] matches anything that is not a number. You shouldn't be surprised that Perl makes your life much easier by
defining some character class a bbreviations. These are alphanumeric characters preceded by a
backslash. Perl allows you to match any number with a \d in your regular expression.
Now for a quick word about metacharacters. Metacharacters are characters that have special meaning within regular
expressions. Therefore if you put them into a regular expression they won't match literally. Unless you precede the
metacharacter with a \. The metacharacters are \|()$^.?* Now for a quick word about each of them do before
we return to character class abbreviations.
Metacharacter(s) | Meaning |
. | Matches any character besides newline |
() | Used for grouping characters |
[] | Used for defining character classes |
| | Used for or in regular expression |
\ | Denotes the beginning of a character class abbreviation, or for the following metacharacter to be matched literally |
* | Quantifier matches 0 or more of the previous character or group of characters |
? | Makes a quantifier nongreedy |
^ | Matches the beginning of a string (or line if /m is used) |
$ | Matches the end of a string (or line if /m is used) |
Now lets define some character classes
Character Class | Meaning |
\d | digit or [0123456789] |
\D | nondigit or [^0123456789] |
\w | word (alphanumeric) or [a-zA-Z_0-9] |
\W | nonword |
\b | word boundary |
\s | whitespace character [ \t\r\n\f] |
\S | non whitespace character |
That's a lot of information to get a handle on. So lets check out pattern-matching examples
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|