Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
We don't bite newbies here... much
 
PerlMonks  

Re^2: Extracting data-structure from HTML using Web::Scraper

by Anonymous Monk
on Jul 14, 2012 at 08:27 UTC ( #981784=note: print w/ replies, xml ) Need Help??


in reply to Re: Extracting data-structure from HTML using Web::Scraper
in thread Extracting data-structure from HTML using Web::Scraper

And XML::Twig since the logic is the same

#!/usr/bin/perl -- use strict; use warnings; use Data::Dump; use XML::Twig; my $sample = q{ <html><body> <h4 class="bla">July 12</h4> <p>Tim</p> <p>Jon</p> <h4 class="bla">July 13</h4> <p>James</p> <p>Eric</p> <p>Jerry</p> <p>Susie</p> <h4 class="date">July 14</h4> <p>Kami</p> <p>Darryl</p> </body></html> }; my @root; my $xml = XML::Twig->new( twig_handlers => { '//body/h4' => sub { dd $_->path; pop @root; push @root, {}, $_->text; }, '//body/p' => sub { dd $_->path; push @{ $root[-2]->{ $root[-1] # key } } , $_->text; }, }, ); $xml->xparse( $sample ); pop @root if not ref $root[-1]; dd \@root; __END__ "/html/body/h4" "/html/body/p" "/html/body/p" "/html/body/h4" "/html/body/p" "/html/body/p" "/html/body/p" "/html/body/p" "/html/body/h4" "/html/body/p" "/html/body/p" [ { "July 12" => ["Tim", "Jon"] }, { "July 13" => ["James", "Eric", "Jerry", "Susie"] }, { "July 14" => ["Kami", "Darryl"] }, ]


Comment on Re^2: Extracting data-structure from HTML using Web::Scraper
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://981784]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2013-05-20 06:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best material for plates (tableware) is:









    Results (404 votes), past polls