Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Parsing HTML files to recover data...

by chinamox (Scribe)
on Nov 22, 2006 at 07:11 UTC ( #585458=note: print w/ replies, xml ) Need Help??


in reply to Parsing HTML files to recover data...

UrbanHick-

while I am still a newbie myself, this might help you in some way or other:

while ($page=~ /<blockquote>(.*?)<\/blockquote>/g) { print "captured text: $1\n"; }

I think this will at least get you started down the right road with regexs. However I would suggest that you listen to the silverbacks around here and go with the HTML::X modules.

good luck,

-mox


Comment on Re: Parsing HTML files to recover data...
Download Code
Replies are listed 'Best First'.
Re^2: Parsing HTML files to recover data...
by kaif (Friar) on Nov 22, 2006 at 12:07 UTC

    Is it just me, or should this response have been one of the first, rather than the sixth? I mean, go HTML::TableContentParser, HTML::TokeParser, HTML::TreeBuilder, Template::Extract, and all the other modules --- but seriously, as a first go ... ?

    # Assuming the page contents are in $_ ($jobname) = m|<span class="jobname">\s*(.*?)\s*</span>|s; ($jobserial) = m|<span class="jobserial">\s*\((.*?)\)\s*</span>|s; ($offices) = m|<span name="offices">\s*(.*?)\s*</span>|s; ($description) = m|<blockquote>\s*(.*?)\s*</blockquote>|s;

    Please excuse my surprise.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://585458]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (19)
As of 2015-07-30 20:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (273 votes), past polls