Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: Parsing HTML/XML with Regular Expressions (HTML::Parser)

by haukex (Archbishop)
on Oct 18, 2017 at 21:47 UTC ( [id://1201622]=note: print w/replies, xml ) Need Help??


in reply to Re: Parsing HTML/XML with Regular Expressions (HTML::Parser)
in thread Parsing HTML/XML with Regular Expressions

Thanks for your contribution! A few comments:

  • The output is unordered since you're using a hash, I'd suggest an array instead.
  • The way your code is checking the id attribute limits the script to only the one example file, which could of course change.
  • As far as I can tell, the reason you're missing Zero is because when you encounter the first <div>, your start_handler is just installing a new handler, which at that point doesn't get called. I'd recommend not changing around the handlers, but instead just using a single handler per event, and keeping state inside the handler, kind of like tangent does here with $in_wanted_div, except that I would recommend keeping the state in the parser object or at least a more tightly scoped variable instead of in a "global" variable.
  • You're not getting the right Sunday because you're using the text argument type, instead of dtext for "decoded text".

Replies are listed 'Best First'.
Re^3: Parsing HTML/XML with Regular Expressions (HTML::Parser)
by fishy (Friar) on Oct 19, 2017 at 08:01 UTC
    Thank you, haukex for your comments and for your interesting OP.
    Yes, tangent's code boosted my knowledge.


    Cheers

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1201622]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-25 07:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found