Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
laziness, impatience, and hubris
 
PerlMonks  

pattern matching and sendmail issues

by csorensen (Beadle)
on Jun 28, 2000 at 21:31 UTC ( [id://20265]=perlquestion: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.

csorensen has asked for the wisdom of the Perl Monks concerning the following question:

here's the problem (with all code - minus the comments) I have an html document with a few thousand email addresses in it. I need to extract these email addresses from the document and send an email to each address (not send one email and cc everyone). 2 issues: 1) I need a better pattern for email addresses but I'm very weak in syntax for regular expressions - is there a good place to learn more about regex ?? 2) the second script would run MUCH faster if I could put the open and close commands to sendmail outside the loop and just send an email to each address - I don't know how to send an eof to sendmail though .. whenever I move the open and close outside the loop sendmail creates an email message for each address in ONE message and sends that one message to the first address. very discouraging any ideas ?? please script 1 - get the addresses
open ADDLIST, "addlist" or die "can't open file: $!"; @names = <ADDLIST>; open NEWLIST, ">>emailist" or die "can't open file: $!"; foreach (@names) { if ( $_ =~ /([^\s\@]{1,}\@[^\s\@]{1,})/) { print NEWLIST $_; } }
script 2 - send the mail
$sendmail = "/usr/lib/sendmail -t"; open ADDRESS, "address.txt" or die "can't open file: $!"; @mail_to = <ADDRESS>; open BODY, "message.txt" or die "can't open file: $!"; $content = <BODY>; foreach (@mail_to) { open(SENDMAIL, "|$sendmail") or die "Cannot open $sendmail: $!"; print SENDMAIL "To: $_ \n"; print SENDMAIL "From: csorensen\@uptimeresources.net \n"; print SENDMAIL "Subject: South African tourism survey \n"; print SENDMAIL "Content-type: text/plain \n\n"; print SENDMAIL $content; close(SENDMAIL); }

Replies are listed 'Best First'.
Re: pattern matching and sendmail issues
by Anonymous Monk on Jun 28, 2000 at 21:38 UTC
    With all due respect, are you asking for advice on spamming? It kind of looks that way...
      no .. the south african tourism board has sent me a list of companies that participated in a trade show earlier this year. they want me to send a survey to all the participants to see what they thought of the show. the problem is .. they sent me WAY too much information in this file.. I just want to extract the email addresses from the file
Re: pattern matching and sendmail issues
by btrott (Parson) on Jun 28, 2000 at 21:49 UTC
    Matching email addresses is difficult. But you're not actually trying to validate them, so you can probably afford to just "do your best", as it were :). This is the regexp used in Pod::HTML for matching email addresses; it's not going to catch everything, and it's probably going to wrongly match some addresses. But it may help.
    if ($word =~ /[\w.-]+\@\w+\.\w/) { # looks like an e-mail address
    This is used on an individual "word", where a word is obtained by splitting a string on /\s+/. So that's one example. If you look around a bit more, you can probably find others.

    For part 2 (sending the email)--if you're sending the same content to each of the addresses, then you could perhaps use Bcc to write all of the addresses to the message.

    for my $addr (@mail_to) { print SENDMAIL "Bcc: $addr\n"; } print SENDMAIL "From: csorensen\@uptimeresources.net \n"; print SENDMAIL "Subject: South African tourism survey \n"; print SENDMAIL "Content-type: text/plain \n\n"; print SENDMAIL $content;
Re: pattern matching and sendmail issues
by lhoward (Vicar) on Jun 28, 2000 at 21:53 UTC
    A simple regular-expression extractor for internet e-mail addresses is (not fully RFC compliant, but will handle %99 of the addresses you see out there):
    while($data=~/([\w.-]+\@(?:[\w.-]\.)+\w+)/gcs){ #e-mail address in $1 }

    If I were use I'd consider using the Mail::Bulkmail module to do the sending. It is designed for doing mass-mailings like you describe.

Re: pattern matching and sendmail issues
by chromatic (Archbishop) on Jun 29, 2000 at 00:51 UTC
    It depends on the structure of the HTML file, but how about using a module like HTML::Parse or HTML::TokeParse to chop up the data file and return the addresses to you? You're more likely to go mad trying to write a regex to handle all of the possibilities.
Re: pattern matching and sendmail issues
by t0mas (Priest) on Jun 29, 2000 at 06:02 UTC
    THE internet email address matching regexp is found here. It is written by Jeffrey E. F. Friedl who also wrote the book Mastering Regular Expressions which is a good place to learn more about regex.

    /brother t0mas
      I picked up Mastering Regular Expressions on my way to work today. Thanks for the link to the regex!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://20265]
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.