Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: HTML parsing OR capturing text from a string within tags

by Popcorn Dave (Abbot)
on Dec 24, 2006 at 09:12 UTC ( #591499=note: print w/replies, xml ) Need Help??


in reply to Re^2: HTML parsing OR capturing text from a string within tags
in thread HTML parsing OR capturing text from a string within tags

All that code does is get a html page and parse it in to tokens. It will spit the whole mess out, so I ran it at command line, e.g. perl tokeparser.pl > output.txt

That way you can scan through the file and see how it's tokenizing the information you fed it.

Revolution. Today, 3 O'Clock. Meet behind the monkey bars.

If quizzes are quizzical, what are tests?

  • Comment on Re^3: HTML parsing OR capturing text from a string within tags

Replies are listed 'Best First'.
Re^4: HTML parsing OR capturing text from a string within tags
by kevyt (Scribe) on Jan 02, 2007 at 17:44 UTC
    Yahoo offers something that I can use. I can send yahoo a request and yahoo will send me a xml file BUT I am getting errors because yahoo has urls with &'s in the file. I can either replace all of the & with %26 and save the file and then let the XML::Parser do the work or I can look at the Parser code and determine where it parses the file and make the change there. I am found where it parses the file in Expat.pm :: sub parse. Then it calls ParseString() but I cant find the sub ParseString.

    http://local.yahooapis.com/LocalSearchService/V2/localSearch?appid=YahooDemo&query=plumbing&zip=22222&format=php&results=10
    Kevin
      I'm not sure why XML::Parser would complain about the & between code tags, but I've never used it myself. You might have a go at that with XML::Simple. I've seen quite a few monks say positive things about that module.

      As far as the & goes, there are monks that are better equipped to handle that question. My code that I pointed you to was for tearing down HTML in to parsed tokens, not dealing with XML.

      Revolution. Today, 3 O'Clock. Meet behind the monkey bars.

      If quizzes are quizzical, what are tests?

        I think I had a bad install of the module. I was able to use DOM :) Thanks

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://591499]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2019-06-25 17:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (107 votes). Check out past polls.

    Notices?