Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: How to extract an email address from a mailto URL?

by eye (Chaplain)
on Dec 30, 2008 at 07:03 UTC ( #733200=note: print w/replies, xml ) Need Help??

in reply to How to extract an email address from a mailto URL?

If you want to differentiate between addresses in anchor tags and other uses of "mailto:" in the file, read the entire file into memory and use the match operator (m//). As suggested previously, you should use Regexp::Common::Email::Address to help compose a regular expression for the email address and enclosing HTML. I would use "\s+" between the "a" and "href" and "\s*" adjacent to the equal sign to match HTML's treatment of whitespace. Note that HTML allows quoting with both single and double quotes. Also, older HTML allowed you to not quote the information after the equal sign in some circumstances.
  • Comment on Re: How to extract an email address from a mailto URL?

Replies are listed 'Best First'.
Quoting attribute values in HTML
by dorward (Curate) on Dec 30, 2008 at 20:47 UTC
Re^2: How to extract an email address from a mailto URL?
by jdlev (Scribe) on Dec 30, 2008 at 13:17 UTC
    My experience in perl is going on about 3 some of what you are saying is greek to me. Can you provide an example of how you would do it? The source file to pull the information from has the tag as follows:

    // -->
    Fax:  (301)931-1285 


    I'm sorry to have to be wet nursed through this...but I have learned a ton of stuff over the last few weeks...I feel like my brain is going to explode!

      Well, first install these two modules (and their unresolved dependencies if there are any):

      Then you can do something like this (Quickshot, untested):

      #!/usr/bin/perl use strict; use warnings; use Regexp::Common qw(Email::Address); use Email::Address; my $filename = 'file_to_parse.dat'; open my $rh, '<', $filename or die "$filename: $!"; # Requirement: href=, mailto: and the mailaddress must be in the same +line! my @addresses = map { m/mailto:($RE{Email}{Address})/o; $1 } grep { m/href=.+?mailto:/ } <$rh> ; close $rh; { local $, = local $\ = "\n"; print @addresses; } __END__
        Thanks, works great!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://733200]
[Corion]: 1nickt: Not in the general sense... I only have very specific crawlers, but not a simple crawler like that ;) But maybe that would be a good application/( stress) test for Future::HTTP to parallelize
[Corion]: Also, a good application to test my API to rate limit things

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (11)
As of 2017-10-18 11:32 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (244 votes). Check out past polls.