Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

HTML::TreeBuilder::XPath returns things I don't need?

by szabgab (Priest)
on Oct 06, 2014 at 15:23 UTC ( #1102984=perlquestion: print w/replies, xml ) Need Help??

szabgab has asked for the wisdom of the Perl Monks concerning the following question:

use 5.010; use HTML::TreeBuilder::XPath; my $tree= HTML::TreeBuilder::XPath->new; my $html = <<'HTML'; <html> <title>four</title> <head> <title>one</title> </head> <body> <title>two</title> </body> <title>three</title> </html> HTML $tree->parse($html); say $tree->findvalue( '/html/head/title');
I was expecting this to print 'one', but instead it printed 'fouronetwothree'. Am I misunderstand what XPath is supposed to do?

Should I use some other module?

Replies are listed 'Best First'.
Re: HTML::TreeBuilder::XPath returns things I don't need?
by choroba (Archbishop) on Oct 06, 2014 at 15:38 UTC
    The <title> tag is valid in the <head> only in HTML. The parser tries to fix the misplaced titles for you, try dumping $tree to see how.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      I see. I tried it inside the body with elements in different locations and it worked as I expected. Thanks.
Re: HTML::TreeBuilder::XPath returns things I don't need?
by toolic (Bishop) on Oct 06, 2014 at 15:46 UTC
    XML::Twig (by the same module author) can give you what you want:
    use warnings; use strict; use XML::Twig; my $xml = <<XML; <html> <title>four</title> <head> <title>one</title> </head> <body> <title>two</title> </body> <title>three</title> </html> XML my $twig = XML::Twig->new( twig_handlers => { 'html/head/title' => sub { print $_->text(), "\ +n" } }, ); $twig->parse($xml); __END__ one
      Thanks
Re: HTML::TreeBuilder::XPath returns things I don't need?
by Anonymous Monk on Oct 06, 2014 at 23:20 UTC

    Am I misunderstand what XPath is supposed to do?

    Trees are trees :)

    htmltreexpather.pl

    $ perl htmltreexpather.pl junktitle.html _tag title HTML::Element=HASH(0xcded54) 0.0.0 four /html/head/title /html/head/title /html/head/title ------------------------------------------------------------------ HTML::Element=HASH(0xcdea04) 0.0.1 one /html/head/title[2] /html/head/title[2] /html/head/title[2] ------------------------------------------------------------------ HTML::Element=HASH(0xcde954) 0.0.2 two /html/head/title[3] /html/head/title[3] /html/head/title[3] ------------------------------------------------------------------ HTML::Element=HASH(0xcde8e4) 0.0.3 three /html/head/title[4] /html/head/title[4] /html/head/title[4] ------------------------------------------------------------------ ##################################################################

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1102984]
Approved by toolic
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2020-05-29 08:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (168 votes). Check out past polls.

    Notices?