Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

regexp question

by ramprasad27 (Sexton)
on Oct 28, 2011 at 05:06 UTC ( [id://934309]=perlquestion: print w/replies, xml ) Need Help??

ramprasad27 has asked for the wisdom of the Perl Monks concerning the following question:

my $string = 'end.'; $string =~ /end[.]/; print $&;
it matches and prints end. my question is why '.' inside character class has lost its special meaning(match any character but \n)

Replies are listed 'Best First'.
Re: regexp question
by MVS (Monk) on Oct 28, 2011 at 06:32 UTC

    Regex special characters act differently when they are within a bracketed character class, and many of them (including .) no longer have a special meaning there. See "Special Characters Inside a Bracketed Character Class" in perlrecharclass for the full list.

      isnt that confusing as to what special variables have (no special/special) meaning inside character class? isnt this that makes perl look ugly some times?

        Maybe a little at first, but you'll quickly get used to that after looking at enough regular expressions. Besides, as johnny_carlos mentioned above, some of the special characters wouldn't make sense when you think of them as part of a character class.

Re: regexp question
by johnny_carlos (Scribe) on Oct 28, 2011 at 05:40 UTC
    I don't know if I'm wise enough yet, but I'll try. I think it's because it would be non-sensical to match any character within a character class. The character class is for matching a range(subset) of characters. So if one of those matches is "any character", then what would be the point of the character class? Does that make sense?
Re: regexp question
by anneli (Pilgrim) on Oct 28, 2011 at 06:45 UTC

    Note that, in addition to MVS' answer, your demonstration here would print "end." even if it behaved as you previously expected!

      oh well, you are rite but waht if I have other than '.' at the end, it doesnt match, meaning it doesnt hav special meaning inside character class

        That's correct; but .. uh, you might as well just use . for that!

Re: regexp question
by JavaFan (Canon) on Oct 28, 2011 at 08:36 UTC
    Because having a character inside the character class that matches every but a newline doesn't make much sense. What would be the advantage of using /[.]/ over /./ if a dot meant the same thing inside a character class as outside it?
      I would ask you the same question whats advantage in this case /[\d\s]/; meaning doesnt change outside of cc
        Of course it's different.
        [\d\s] # matches a single char - digit or whitespace \d\s # matches two chars - a digit then whitespace
        Because one can use \d and \s to build larger character classes. A dot already matches everything but a single character.

        What's the point of having such a character inside a character class? All you can do is build 4 different character classes: /[.]/, /[.\n]/, /[^.]/, /[^.\n]/. But they can all be written in a simple, different way: /./, /(?s:.)/, /\n/, and /(*FAIL)/

        So, once again, what would be the advantage of having dot inside a character class mean the same thing as outside of it?

Re: regexp question
by raybies (Chaplain) on Oct 28, 2011 at 13:20 UTC

    fwiw, the square bracket character class has been a part of regex since prior to Perl existing. I remember way back in the daze when I was learning to grep commandline in a basic Unix course this being a feature of regex. Seems kinda silly to debate an accepted convention that's been around since prior to when you were born.

    Honestly this is not the most confusing aspect of regex. Regex by its nature is an abreviated notation and therefore a challenge to make readable. Perl has (imo) the best featureset of any language when it comes to making regexes understandable, but it's still regex... which means one treads a cryptic realm of magic and wonder... :)

Re: regexp question
by ikegami (Patriarch) on Oct 29, 2011 at 19:04 UTC

    In Perl, "." is the concatenation operator.

    In regex patterns, it's the "match (almost) any characters" atom.

    In character classes, it's a period.

    All of these are wise choices. Just like it wouldn't make sense for "." to mean concatenation inside of regexp patterns, it wouldn't makes sense for "." to mean "any character" inside of a character class. (e.g. [.abc] would be the same as just .)

    It didn't lose any special meaning, it wasn't given any special meaning.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://934309]
Approved by rgiskard
Front-paged by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-24 01:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found