Thanks for your contribution! A few comments:
- The output is unordered since you're using a hash, I'd suggest an array instead.
- The way your code is checking the id attribute limits the script to only the one example file, which could of course change.
- As far as I can tell, the reason you're missing Zero is because when you encounter the first <div>, your start_handler is just installing a new handler, which at that point doesn't get called. I'd recommend not changing around the handlers, but instead just using a single handler per event, and keeping state inside the handler, kind of like tangent does here with $in_wanted_div, except that I would recommend keeping the state in the parser object or at least a more tightly scoped variable instead of in a "global" variable.
- You're not getting the right Sunday because you're using the text argument type, instead of dtext for "decoded text".