HTML::TreeBuilder::XPath returns things I don't need?

by szabgab (Priest)
use 5.010; use HTML::TreeBuilder::XPath; my $tree= HTML::TreeBuilder::XPath->new; my $html = <<'HTML'; <html> <title>four</title> <head> <title>one</title> </head> <body> <title>two</title> </body> <title>three</title> </html> HTML $tree->parse($html); say $tree->findvalue( '/html/head/title');
I was expecting this to print 'one', but instead it printed 'fouronetwothree'. Am I misunderstand what XPath is supposed to do?

Should I use some other module?

Re: HTML::TreeBuilder::XPath returns things I don't need?
by choroba (Cardinal) on Oct 06, 2014 at 15:38 UTC
    The <title> tag is valid in the <head> only in HTML. The parser tries to fix the misplaced titles for you, try dumping $tree to see how.
      I see. I tried it inside the body with elements in different locations and it worked as I expected. Thanks.
Re: HTML::TreeBuilder::XPath returns things I don't need?
by toolic (Bishop) on Oct 06, 2014 at 15:46 UTC
    XML::Twig (by the same module author) can give you what you want:
    use warnings; use strict; use XML::Twig; my $xml = <<XML; <html> <title>four</title> <head> <title>one</title> </head> <body> <title>two</title> </body> <title>three</title> </html> XML my $twig = XML::Twig->new( twig_handlers => { 'html/head/title' => sub { print $_->text(), "\ +n" } }, ); $twig->parse($xml); __END__ one
Re: HTML::TreeBuilder::XPath returns things I don't need?
by Anonymous Monk on Oct 06, 2014 at 23:20 UTC

    Am I misunderstand what XPath is supposed to do?

    Trees are trees :)

    $ perl junktitle.html _tag title HTML::Element=HASH(0xcded54) 0.0.0 four /html/head/title /html/head/title /html/head/title ------------------------------------------------------------------ HTML::Element=HASH(0xcdea04) 0.0.1 one /html/head/title[2] /html/head/title[2] /html/head/title[2] ------------------------------------------------------------------ HTML::Element=HASH(0xcde954) 0.0.2 two /html/head/title[3] /html/head/title[3] /html/head/title[3] ------------------------------------------------------------------ HTML::Element=HASH(0xcde8e4) 0.0.3 three /html/head/title[4] /html/head/title[4] /html/head/title[4] ------------------------------------------------------------------ ##################################################################

