Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
There was one node on here when I searched how to do this (code shown below) but it works with errors. It produces the right output but it has some 30 lines of errors saying an uninitialized value.
Ideally, I want to collect all meta tags and store each of them into a hash with the meta name as the key. I'm already using TokeParser for scraping, so please don't suggest I also use TokeParser::Simple. I read through the docs and can't seem to find any information on what I am looking for.my %meta; my $htm2 = HTML::TokeParser->new( \$src ); while (my $token = $htm2->get_token) { next if $token->[1] ne 'meta' && $token->[0] ne 'S'; $meta{$token->[2]{name}} = $token->[2]{content}; }
Also, if a modified version of the code above works, can you explain line for line what it's doing? I'm having trouble piecing things together.
My last question is this. Can I extract different parts of an HTML document with TokeParser in one run? Or must I run them all separately?
I can extract the title tag just fine, but only when I make a new reference to TokeParser. It seems like a waste of resources to call the module AGAIN when the html dump is still in memory, right? Or does the data change after each time you loop over tokens?
|
---|
Replies are listed 'Best First'. | |
---|---|
What's wrong with HTML::TokeParser::Simple?
by Ovid (Cardinal) on Mar 20, 2006 at 23:34 UTC | |
Re: meta tag extraction with TokeParser
by Thelonius (Priest) on Mar 21, 2006 at 00:18 UTC | |
Re: meta tag extraction with TokeParser
by saintmike (Vicar) on Mar 20, 2006 at 20:59 UTC | |
by Anonymous Monk on Mar 20, 2006 at 21:44 UTC | |
by saintmike (Vicar) on Mar 20, 2006 at 21:47 UTC | |
by Anonymous Monk on Mar 20, 2006 at 21:51 UTC | |
by Anonymous Monk on Mar 20, 2006 at 21:54 UTC |
Back to
Seekers of Perl Wisdom