Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: Working with source of returned web page

by Popcorn Dave (Abbot)
on Jun 10, 2008 at 22:48 UTC ( #691353=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Working with source of returned web page
in thread Working with source of returned web page

It's been a while since I used that module but if I recall correctly, it parses everything in to a token and the tokens not defined as an HTML tag should be defined as a text token.

Take a look at HTML::TokeParser help - parsing headlines and you'll see a quick program I wrote to dump an HTML page to tokenized output. Run that on your page and I think you'll see you don't need to do the regex per se, but rather need to check text tokens to find what you're after.

Good luck!

Update: Changed link from scratchpad to node as per suggestion by ww


Revolution. Today, 3 O'Clock. Meet behind the monkey bars.

I would love to change the world, but they won't give me the source code


Comment on Re^3: Working with source of returned web page

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://691353]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (12)
As of 2015-07-05 20:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (67 votes), past polls