|Keep It Simple, Stupid|
Re^5: Breaking The Rules IIby BrowserUk (Pope)
|on Jul 03, 2007 at 21:18 UTC||Need Help??|
Some monks do indeed argue that "you can parse HTML with a regex" (e.g. tye) but I've never seen any argue against using a parser.
You didn't read what I wrote. I never said "use regex to parse html". If i wanted to parse html, I would use HTML::Parser (which I mentioned above as a fine pragmatic module). But if I want to extract some strings from amongst some other strings, I use regex.
To bring the linked example above up to date as the page it used went away many moons ago. One of the web pages I visit frequently is the BBC Weather forecast.
Here is a script to extract and print a selection of the data that page contains:
When run it produces output like this:
Your mission, should you choose to accept it, is to reproduce that script using HTML::TokeParser::Simple and post it here.
If you took the time to learn perl well you'll not need to. And I say that with no limitations or caveats. :-)
That's simply not true. For one, I couldn't replace HTML::Parser with Perl code, because it's a Perl wrapper around an XS wrapper around 41k of intensely involved C code. Nor GD for similar reasons. Nor Time::HiRes. Nor... about 60 more modules, but that would just belabour the point.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.