zalezny has asked for the wisdom of the Perl Monks concerning the following question:

Dear Colleagues, at the moment I`m writing script for an old RHEL 3. After script execution user will be prompted to insert E-mail address. The problem for me is that I dont know how to check if entered string is E-mail address. It should check if entered string is: - not empty - is similar to email@hostname - is similar to email@hostname.extension (for example: root@localhost.com) I done something like this, but its not working as expected.
print "E-mail address:"; my $email = <>; chomp $email; until ($email =~ m/^[a-z0-9]@[a-z0-9]$/ && $email ne "" && $email =~ m +/^[a-z0-9]@[a-z0-9].*$/) # While input is wrong... { print "Uppps, sorry man, but it doesnt look like E-mail addres +s :/"; print "Wanna try again? "; # Ask again $email = <STDIN>; # Get input again chop $email; # Chop off newline again }
Maybe somebody will be so kind and support me ? Zalezny

Replies are listed 'Best First'.
Re: input - E-mail address - how to check string ?
by hippo (Chancellor) on Feb 25, 2015 at 09:44 UTC

    Looks like your first regex is bogus - it will only match against local hostnames. But you have other problems too: you are reading from what may be different filehandles inside and outside the loop. You chomp outside, but chop inside.

    Since "but its not working as expected." isn't really the best bug report in the world there may be other issues, but if you fix the DRY and the regex you should be half way there. eg.

    my $email = ''; until ($email =~ /^[^ ,@]+\@([a-z0-9-]+\.)+[a-z]+$/) { print 'E-mail address: '; $email = <>; chomp $email; }

    You can tweak the prompt and the regex until happy but that should get you started.

      Thank You very much my friend! You just won, big virtual beer for Your help! Its working like a charm! http://nonjoiner.com/wp-content/uploads/2014/05/stockvault-beer-mug138814.jpg
Re: input - E-mail address - how to check string ?
by AppleFritter (Vicar) on Feb 25, 2015 at 10:46 UTC

    There's a number of email address validation modules on CPAN, e.g. Mail::RFC822::Address, RFC::RFC822::Address, Email::Valid and Mail::Address (which will attempt to extract an email address). None of these cover all the weird little corner cases.

    The best tool I've found for this job is isemail, which comes with a very comprehensive test suite designed to test all those corner cases. Unfortunately it's for PHP, not Perl.

    Then again you need to ask yourself what you want to do in the first place. Do you want to make sure that an email address is syntactically valid, i.e. conforming to the stipulations of all the relevant RFCs? Or do you want to make sure that an email address works, i.e. that mail sent to it will actually be received?

    In practice it's almost always the latter, and your best bet then is to not bother with the RFCs at all and instead simply try to send an email to that address. An email address may be syntactically invalid, yet still work "in the wild". What's more, an email address that IS syntactically valid may not actually work; mail sent to that address may never get read by the intended recipient, or any person at all.

    Here's two articles arguing this point.

      instead of 'try to send an email to that address' you can, strip the domain part, lookup that domain for MX record, establish an SMTP sesssion and issue a "RCPT TO" or "VRFY" command and wait for a '200 OK' answer. Less spammy.

      HtH
      L*
      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
        establish an SMTP sesssion and issue a "RCPT TO" or "VRFY" command and wait for a '200 OK' answer.

        In our days, no sanely configured internet mail server will expose its account list by giving a clear positive or negative response to the VRFY or RCPT TO commands. Typically, the answer to VRFY is something like "I won't tell you" (my local Exim answers "252 Administrative prohibition"), and RCPT TO will be accepted no matter what mail account you try. The mailserver will decide after(!) the SMTP dialog how to handle that mail, and may send back a "mail not deliverable" mail. That may take hours or days, depending on the mail server configuration (greylisting).

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: input - E-mail address - how to check string ?
by Ratazong (Monsignor) on Feb 25, 2015 at 09:14 UTC

    Hi Zalezny

    I suggest to search a bit in CPAN - there is surely a module for it that fits your need. Possibly Email::Valid.

    HTH, Rata

      Hi, I`m not allowed to use it on my server. I`m searching for regular expression. My server is old crap and I`m not allowed to install on it anything new. Its RHEL 3.9, not supported without repos... Thats why I`m searching for regexp.

        I'm sorry if this sounds like I'm annoyed with you; I'm not, but this is a pet peeve of mine. Really. I'm trying very hard not to rant.

        Whatever you do, make sure that it supports (at least) all valid email addresses, which your regex doesn't. Read the RFCs. Many people assume that the left side of the "@" is as restricted as the right side. It isn't so. Among the many discussions, please see http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address?rq=1

        It really annoys me to run into some web-site which won't let me enter a perfectly valid email address because someone has done as you're trying to do and botched it, as you're probably going to do. No offence intended; an email address is a non-trivial thing to parse.

        By saying in effect "I'm not allowed to use CPAN", you're depriving yourself of one of the major reasons to use perl, and inviting errors in implementation. If you're not allowed to install CPAN modules system-wide, there are lots of ways to install them for a project, ending with copy/paste the code from CPAN.

Re: input - E-mail address - how to check string ?
by kroach (Pilgrim) on Feb 25, 2015 at 10:05 UTC

    You have three conditions for email validation in your script:

    $email =~ m/^[a-z0-9]@[a-z0-9]$/

    This will not allow domain extensions, so "u@d" will match, but "u@d.org" will not. Your character sets are not quantified, so they will only match a single character, "user@domain" will no longer match.

    If the previous condition matched, the string cannot be empty, so the second condition is superfluous. The last one is actually equivalent to the first, because it matches what the first does and more. Since for it to even be checked the first condition must match, it's virtually the same.

    Now for the solution. You need to add quantifiers to the character sets, add a dot (which need to be escaped since it matches anything otherwise) and add another set at the end. The condition could look like this:

    $email =~ /^[a-z0-9]+@[a-z0-9]+\.[a-z0-9]+$/

    This will not match an empty string, since there needs to be at least one character from every character set, as well as a dot and a @.

    If you don't mind allowing uppercase you could simplify it to the form:

    $email =~ /^\w+@\w+\.\w+$/

      see NaN's response above, and also consider a common address structure of user@department.company.tld. This and other more complex structures are quite common.

      As is stated much earlier in this thread (and learned through experience as a bitnet/usenet/campus email routing gateway admin), email addresses are not trivial. The localpart of addresses is oft horribly mangled or restricted ('-' in the $localpart converted to '_', case mangled from what is provided, etc). I treat the mangelings reasonably on my side, but the fact that they get mangled in the first place is irritating, and gives me pause when considering if the organization's IT department is up to the challenge. Since I seed addresses given to companies to identify spam leakage, I also eschew companies that restrict or mangle '-' and '+' when possible.

      --MidLifeXis

        Yes, I have indeed overlooked that, thank you for pointing that out. The solution hippo came up with seems to be much better.
      $email =~ /^[a-z0-9]+@[a-z0-9]+\.[a-z0-9]+$/

      It ain't as easy as that, unfortunately. Among multiple other problems, this rejects the millions of Brits who have an address ending in @<whatever>.co.uk.

Re: input - E-mail address - how to check string ?
by Anonymous Monk on Feb 25, 2015 at 09:50 UTC

    Probably you have lost some quantifiers in regexp.

    Try for example with:

    while ( $email !~ m/^[\w\.\-]+\@[\w\.\-]+\.\w{2,3}$/ ) { ...
Re: input - E-mail address - how to check string ?
by afoken (Canon) on Feb 26, 2015 at 19:25 UTC

    In addition to what thargas++ wrote: A very simple approach is to allow anything that has at least one arbitary character left of the rightmost @, and at least one . surrounded by at least one arbitary character on each side on the right hand side of the rightmost @. A trailing . is allowed, a . right after the rightmost @ is not.

    Rationale:

    • The relevant RFCs allow nearly everything left of the rightmost @, and while most people use email addresses like joe.user@example.com, some people have more complex email addresses. Some mail servers allow using a part of the email address to help sorting mails. (joe.user+pizza@example.com and joe.user+pasta@example.com both get delivered to joe.user@example.com, but are automatically sorted into the pizza resp. pasta folders.) Many people have names that do not match your local idea of what a name is, so their mail address won't match your local idea of what a mail account is. The recent RFCs allow Unicode (UTF-8), so expect umlauts, accented characters, japanese, chinese, arabic, and many other letters in the account part of the mail address. In summary: don't restrict the left-hand side.
    • Rules for domains change, as more and more TLDs are invented. Restricting domains or even just TLDs to some list or character set won't work in the long run.
    • Domains may contain unicode (encoded using punycode)
    • Most times, if not always, you do not want to deliver to local computers, but to computers somewhere else. So the domain part must contain at least one dot. Some people still use an IPv4 address in their mail address, this is also matched by the "dot surrounded by any characters" rule.
    • A trailing dot is allowed in the domain part. In fact, most people omit it because DNS resolving usually does the right thing. Adding a trailing dot makes absolutely clear where the mail has to be delivered.
    • A leading dot in the domain part is not valid.
    • And my favorite one: Comments are allowed in both account and domain part of the email address.

    These simple rules will allow almost all email addresses. IPv6 addresses right of the rightmost "@" won't work, due to the "one dot required" rule. So you may want to relax that rule or extend it to require at least one dot or colon instead of just one dot.

    Of course, these simple rules allow a lot of false email addresses. You have to live with that. Your systems should already be able to handle undeliverable emails. Even syntactically valid email addresses may become undeliverable some day. People change their job or their mail provider, so the old email address will no longer be used or may be deleted.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)