Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Mojo::DOM find tag after another tag

by luxs (Beadle)
on May 28, 2016 at 17:23 UTC ( [id://1164401]=perlquestion: print w/replies, xml ) Need Help??

luxs has asked for the wisdom of the Perl Monks concerning the following question:

I've got simple HTML
yada...yada...yada.. <img src="111"> yada...yada...yada.. <img src="222"> yada...yada...yada.. <h1>Some title</h1> yada...yada...yada.. <img src="333"> yada...yada...yada.. <img src="444">
How can I find "333" and "444" (i.e. atributes of IMG tags after H1 tag) with mojo::dom functions? PS. With regex this is a trivial task.

Replies are listed 'Best First'.
Re: Mojo::DOM find tag after another tag
by haukex (Archbishop) on May 28, 2016 at 17:50 UTC
      The only script I've manage to do is
      my $tmpi = 0; $fimg = $dom ->find('img, h1') ->map( sub { if( $_->tag eq 'h1' ) { $tmpi++; } if( $tmpi > 0 && defined $_->attr->{'src'} ) { $_->attr->{'src'} } # without semicolumn!!!! } ); for( my $ii = 0; $ii <= $#$fimg; $ii++ ) { if( length( $fimg->[$ii] ) > 0 ) { print $fimg->[$ii]; } }
      looks too complicated and not nice (but working).

        Hi luxs,

        Since I'm guessing the <img> tags can be nested arbitrarily deep, the CSS selector "img, h1" was going to be my next suggestion.

        There are many other ways to write the logic, here's one:

        use warnings; use strict; use Mojo::DOM; my $data = <<'ENDDATA'; yada...yada...yada.. <img src="111"> yada...yada...yada.. <img src="222"> yada...yada...yada.. <h1>Some title</h1> yada...yada...yada.. <a href="444444"> <img src="333"> yada...yada...yada.. <img src="444"> ENDDATA my $dom = Mojo::DOM->new($data); my (@imgs,$seen_h1); for my $tag ( $dom->find('h1, img')->each ) { if ($seen_h1 && $tag->tag eq 'img' && length $tag->attr('src')) { push @imgs, $tag } elsif ($tag->tag eq 'h1') { $seen_h1 = 'true' } } print $_->attr('src'),"\n" for @imgs; __END__ 333 444

        You don't even need the intermediate @imgs array and can print directly from the first loop if you like.

        Hope this helps,
        -- Hauke D

Re: Mojo::DOM find tag after another tag
by Anonymous Monk on May 29, 2016 at 00:04 UTC
    in DOM, if they're not parent and child, tags/tokens are siblings
      $ perl -Mojo -le " my $dom = x(b(q{2.html})->slurp); for my $img ( $d +om->find(q{h1 ~ img})->each ){ print $img->attr(q{src}); } " 333 444
        Nice script, but it fails if between H1 and IMG occur other tags.
        my $data = 'yada...yada...yada.. <img src="111"> yada...yada...yada.. <img src="222"> yada...yada...yada.. <h1>Some title</h1> yada...yada...yada.. <a href="444444"> <img src="333"> yada...yada...yada.. <img src="444">'; my $dom = Mojo::DOM->new($data); for my $img ( $dom->find(q{h1 ~ img})->each ) { print $img->attr(q{src}); }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1164401]
Approved by stevieb
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2024-04-23 12:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found