Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
There was one node on here when I searched how to do this (code shown below) but it works with errors. It produces the right output but it has some 30 lines of errors saying an uninitialized value.

my %meta; my $htm2 = HTML::TokeParser->new( \$src ); while (my $token = $htm2->get_token) { next if $token->[1] ne 'meta' && $token->[0] ne 'S'; $meta{$token->[2]{name}} = $token->[2]{content}; }
Ideally, I want to collect all meta tags and store each of them into a hash with the meta name as the key. I'm already using TokeParser for scraping, so please don't suggest I also use TokeParser::Simple. I read through the docs and can't seem to find any information on what I am looking for.

Also, if a modified version of the code above works, can you explain line for line what it's doing? I'm having trouble piecing things together.

My last question is this. Can I extract different parts of an HTML document with TokeParser in one run? Or must I run them all separately?

I can extract the title tag just fine, but only when I make a new reference to TokeParser. It seems like a waste of resources to call the module AGAIN when the html dump is still in memory, right? Or does the data change after each time you loop over tokens?


In reply to meta tag extraction with TokeParser by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others contemplating the Monastery: (None)
      As of 2024-09-10 07:08 GMT
      Sections?
      Information?
      Find Nodes?
      Leftovers?
        Voting Booth?

        No recent polls found

        Notices?
        erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.