Re^2: Parsing HTML/XML with Regular Expressions (HTML::Parser)

Thanks for your contribution! A few comments:

The output is unordered since you're using a hash, I'd suggest an array instead.
The way your code is checking the id attribute limits the script to only the one example file, which could of course change.
As far as I can tell, the reason you're missing Zero is because when you encounter the first <div>, your start_handler is just installing a new handler, which at that point doesn't get called. I'd recommend not changing around the handlers, but instead just using a single handler per event, and keeping state inside the handler, kind of like tangent does here with $in_wanted_div, except that I would recommend keeping the state in the parser object or at least a more tightly scoped variable instead of in a "global" variable.
You're not getting the right Sunday because you're using the text argument type, instead of dtext for "decoded text".

Comment on Re^2: Parsing HTML/XML with Regular Expressions (HTML::Parser) Select or Download Code

Replies are listed 'Best First'.
Re^3: Parsing HTML/XML with Regular Expressions (HTML::Parser) by fishy (Friar) on Oct 19, 2017 at 08:01 UTC
Thank you, haukex for your comments and for your interesting OP. Yes, tangent's code boosted my knowledge. Cheers	[reply]