Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Common RegExps

by Anonymous Monk
on Jul 31, 2000 at 22:25 UTC ( #25309=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Is there a webpage somewhere that lists commonly used Perl RegExps ? For example, regexps for a (syntactically) valid email address, URL, IP address etc ?
mb

Replies are listed 'Best First'.
Re: Common RegExps
by merlyn (Sage) on Jul 31, 2000 at 22:30 UTC
    The regex for a syntactically valid email address is about a full screenfull. Ditto for a URL. And IP addresses are better checked algorithmically, not statically as text.

    So what problem are you really solving, for which you got to the step "use a regex". Perhaps you should back up a step. {grin}

    -- Randal L. Schwartz, Perl hacker

      Well, I was looking for something that parses xyz@somewhere.something.com. Rules such as valid characters in the username, and the ip-address, and also to get the TLD etc. What I did not know was that one can have nested structures in the email address, and that of course maikes it context free. Well, if thats the case, at least is there any module which checks for the validity of email addresses ? mb
        Yeah, there's a couple of email address parsing routines in the CPAN. search.cpan.org is having hissyfits right now or I'd give you the exact reference, but check under "email" or "RFC822" or something.

        -- Randal L. Schwartz, Perl hacker

Re: Common RegExps
by BlaisePascal (Monk) on Jul 31, 2000 at 22:38 UTC
    Is there a regex that recognises RFC822 email addresses? I'm not asking to see one, I'm asking if it is even possible! RFC822 is notoriously difficult. It wouldn't surprise me if it couldn't be done.

    (For instance, doesn't RFC822 allow nested comments? If so, that would ruin it right there...)

      The 'owl' book (mastering regular expressions) is a great text for questions like this, and my answer comes from it (paraphrased):

      No, you can't really recognize a valid email address with a regex, because technically an email address can have arbitrarily nested comments in parentheses, and a regular expression can never recognize arbitrarily deep nested structures. When you start talking about balanced constructs, you are out of the land of regular languages and into the land of context free languages.

      I wonder if it would be useful or just unnecessary to have native support for context free grammars in perl...

      To recognize all valid email address that have less than or equal to 1 level of comments requires something like a 5000 byte regular expression.

      The moral of this story is that regex's can't do everything.

      -Mark

      There's an index in Mastering Regular Expressions which is a regex for RFC822 addresses, well except for arbitrarailly nested comments... I think it uses a max of 5 levels or something along those lines. It takes somewhere around 5 pages, and is commented quite well, but it's still not something I'd ever want to have to build.
      -Ted
RE: Common RegExps
by t0mas (Priest) on Aug 01, 2000 at 02:51 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://25309]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2022-10-05 15:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred way to holiday/vacation is:











    Results (24 votes). Check out past polls.

    Notices?