|No such thing as a small change|
Perl HTML confusion...by AI Cowboy (Sexton)
|on Sep 17, 2013 at 17:36 UTC||Need Help??|
AI Cowboy has asked for the
wisdom of the Perl Monks concerning the following question:
I'm having trouble with using Perl to parse an HTML file I have, where I'm trying to grab all <a> and <div> tags if the link or text content matches a certain format (I use a regex for this). However, WWW::Mechanize can only find links (<a> tags), not <div> tags, so that doesn't work. I've tried learning HTML::TreeBuilder but it seems that my brain doesn't understand the documentation very well for some reason.
I'm wondering if you chaps can either direct me to a better, cleaner Perl module that can extract all tags and let me analyze their attributes/text, or help me with my problem with HTML::TreeBuilder?
My problem is that with, for example, http://search.cpan.org/~cjm/HTML-Tree-5.03/lib/HTML/Element.pm#find_by_tag_name, I have no idea what $h is, or where it's coming from. It seems - to me - the documentation for TreeBuilder and Element use variables without explaining what they are explicitly, and this hurts my brain. Some help would be wonderful, as I need to finish this project by the end of the week for my job, and I'm not sure what to do or why I'm not understanding this.