Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
at the mercy of change
My problem is I want data. Not *pretty web pages*. Raw data in feed format that I can process. I'm pretty much getting the results you are looking for now but not beating my head around having to parse html with all it's problems: namely you open to the mercy of web designers whim to change the layout. use rdf, rss or pda feedsSo I avoid HTML. I'm lazy. I look for the rss, rdf, pda html pages. Point my spider and dump them in a directory for later parsing. Most news sites have rss feeds (though my local newspaper, The Age supplies rss feeds for a fee. but produces a lite page for pda's.) so some parsing is necessary. Now suppose I want to parse a page (in Perl) why wouldn't I use Andy Lesters fine WWW::Mechanise? (WWW::Mechanise article). questions, questions, devils advocate I'm not actually knocking the idea.
now you may say, goon your an idiot, be quiet. but ...
esr ~ <a href="http://catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ar01s02.html"> The Cathedral and the Bazaar v3.0</a>. In reply to same prob. different approach
by g00n
|
|