|Perl: the Markov chain saw|
Any pure-perl html to text? (Or: missing a perl equivalent to 'lynx -dump')by bronto (Priest)
|on Oct 15, 2006 at 15:55 UTC||Need Help??|
bronto has asked for the
wisdom of the Perl Monks concerning the following question:
I am writing a couple of web-page-scraping tools that will help me in my job seek. I already have something working, but what I am missing is a nice pure perl solution that would format a web page to a nice plain text, so that if an announcement is, for any reason, removed, I still have a chance of getting to the contents
And hence the question: is there anything like lynx -dump in Perl? I dug into CPAN for about half an hour and tried html2text, but it didn't really do a good job...
For the few of you that don't know what lynx is and what it does:
Thanks a lot in advance for your help
In theory, there is no difference between theory and practice. In practice, there is.