|We don't bite newbies here... much|
What is the fastest way to parse HTML?by sri (Vicar)
|on Jul 22, 2003 at 22:37 UTC||Need Help??|
sri has asked for the
wisdom of the Perl Monks concerning the following question:
When I say parse, I mean extracting the text and formatting information like font size, bold, is anchor etc...
The main requirements are speed and fault tolerance.
There are a few possible solutions that came to my mind:
- HTML::Parser would be the easiest, but is it the fastest?
- XML::Parser, maybe not fault tolerant enought
- Build something with flex or bison and make XS binding
What do you think is the fastest way to parse realworld(tm) HTML?