Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Some issues with WWW::Mechanize::Firefox->xpath() method

by dfaure (Chaplain)
on Apr 02, 2013 at 10:01 UTC ( #1026644=perlquestion: print w/ replies, xml ) Need Help??
dfaure has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm having an annoying issue with WWW::Mechanize::Firefox xpath() method. The following code...

#!perl -w use strict; use WWW::Mechanize::Firefox; use Data::Dumper; my $mech = WWW::Mechanize::Firefox->new(activate => 1); $mech->autoclose_tab(0); $mech->update_html(<<'HTML'); <html> <head> <title>Hello Firefox!</title> </head> <body> <h1>Hello <b>World</b>!</h1> <p id='paragraph'>Hello <b>WWW::Mechanize::Firefox</b> Goob bye</p> </body> </html> HTML test_xpath($mech, '//p', all => 1); test_xpath($mech, '//p/text()', all => 1); test_xpath($mech, 'substring(//p,1,4)', all => 1); # expected String: +Hell test_xpath($mech, 'string-length(//p)', all => 1); # expected Number: +38 sub test_xpath { my ($mech, $xpq, %opts) = @_; my @xpr; eval { @xpr = $mech->xpath($xpq, %opts); }; my %results = ( query => $xpq, exception => $@, innerHTML => scalar(@xpr) ? [ map { $_->{innerHTML} } @xpr ] : + undef, textContent => scalar(@xpr) ? [ map { $_->{textContent} } @xpr ] : + undef, nodeValue => scalar(@xpr) ? [ map { $_->{nodeValue} } @xpr ] : + undef ); print Data::Dumper->Dump([\%results], ['results']); }

...shows that (sadly for now in v0.71) not all the Xpath results are handled by this nevertheless amazing module.

$results = { 'nodeValue' => [ undef ], 'exception' => '', 'query' => '//p', 'textContent' => [ 'Hello WWW::Mechanize::Firefox Goob by +e' ], 'innerHTML' => [ 'Hello <b>WWW::Mechanize::Firefox</b> Go +ob bye' ] }; $results = { 'nodeValue' => [ 'Hello ', ' Goob bye' ], 'exception' => '', 'query' => '//p/text()', 'textContent' => [ 'Hello ', ' Goob bye' ], 'innerHTML' => [ undef, undef ] }; $results = { 'nodeValue' => undef, 'exception' => 'MozRepl::RemoteObject: TypeError: The exp +ression cannot be converted to return the specified type. at mech.pl +line 28. ', 'query' => 'substring(//p,1,4)', 'textContent' => undef, 'innerHTML' => undef }; $results = { 'nodeValue' => undef, 'exception' => 'MozRepl::RemoteObject: TypeError: The exp +ression cannot be converted to return the specified type. at mech.pl +line 28. ', 'query' => 'string-length(//p)', 'textContent' => undef, 'innerHTML' => undef };

I would be very interested with any workaround.

____
HTH, Dominique
My two favorites:
If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow
Bien faire, et le faire savoir...

Comment on Some issues with WWW::Mechanize::Firefox->xpath() method
Select or Download Code
Re: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0)
by Anonymous Monk on Apr 02, 2013 at 10:11 UTC

    test_xpath($mech, 'substring(//p,1,4)', all => 1); # expected String: Hell

    That is not valid xpath syntax, what documentation are you reading?

      That is not valid xpath syntax, what documentation are you reading?

      All comes from the specs (http://www.w3.org/TR/xpath/#section-Expressions) which I need to respect.

      ____
      HTH, Dominique
      My two favorites:
      If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow
      Bien faire, et le faire savoir...

      Works for me (using XML::XSH2):
      $ xsh $scratch/> insert element p into /scratch $scratch/> insert text 'Hello world' into /scratch/p $scratch/> echo substring(//p,1,4) Hell
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Works for me (using XML::XSH2):

        And you're sure that's not an xsh2 function/feature?

        Produces no output for me

        #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML; my $html = <<'__HTML__'; <html> <head> <title>Hello Firefox!</title> </head> <body> <h1>Hello <b>World</b>!</h1> <p id='paragraph'>Hello <b>WWW::Mechanize::Firefox</b> Goob bye</p> </body> </html> __HTML__ my $dom = XML::LibXML->new( qw/ recover 2 / )->load_html( #~ string => \$html, ## BUG! scalar... not like load_xml string => $html, ); local $\ = $/; print for $dom->findnodes(q{ substring(//p,1,4) }); ## nada print for $dom->findnodes(q{ //p }); ## paragraph
Re: Some issues with WWW::Mechanize::Firefox->xpath() method
by Corion (Pope) on Apr 02, 2013 at 10:57 UTC

    I would assume that the substring() function wants a string and not a node, and thus one would need to use //p/text() to get at the node text if that is what's wanted.

    The code of WWW::Mechanize::Firefox basically passes XPath queries straight through to Firefox, so if there is a Javascript error raised by the XPath method, that error most likely comes directly from Firefox itself.

      FWIW, I tried with LibXML but I can't get no results with these queries either
      print for $dom->findnodes(q{ substring( //p/text() , 0, 4 ) }); print for $dom->findnodes(q{ substring(string(//title),1,4) }); print for $dom->findnodes(q{ substring(string(//title/text()),1,4) });
      The code of WWW::Mechanize::Firefox basically passes XPath queries straight through to Firefox, so if there is a Javascript error raised by the XPath method, that error most likely comes directly from Firefox itself.

      Could'nt the issue come from the js glue between Perl and Firefox, being unable to deal with XPathResults different from nodes?

      ____
      HTH, Dominique
      My two favorites:
      If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow
      Bien faire, et le faire savoir...

        The JS glue just passes along whatever document.evaluate returns, after converting that array-like XPathResults list into a plain array:

        function(doc,q,ref,cont) { var xres = doc.evaluate(q,ref,null,XPathResult.ORDERED_NODE_SN +APSHOT_TYPE, null ); var map; if( cont ) { map = cont; } else { // Default is identity map = function(e){ return e }; }; var res = []; for ( var i=0 ; i < xres.snapshotLength; i++ ) { res.push( map(xres.snapshotItem(i))); }; return res }

        I'm no expert on XPath and its semantics, but if somebody submits a bug report and preferrably a self-contained example, I can investigate things closer.

        I think its your code, instead of Dumpering %results dump @xpr and look for stringValue

Re: Some issues with WWW::Mechanize::Firefox->xpath() method (ctrl+shift+k, stringValue)
by Anonymous Monk on Apr 02, 2013 at 11:51 UTC

      Ah hah!

      The difference in behaviour is caused by WWW::Mechanize::Firefox / MozRepl::RemoteObject using XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, while your code uses XPathResult.ANY_TYPE. I'm not certain about whether ANY_TYPE will guarantee an ordered snapshot, which I consider important, as I'd like the nodes to appear in "document order" in the result, and I'd like them to remain unchanged from the time the snapshot was taken, because there is transfer latency between Firefox and Perl.

      The documentation talks about nodes, so it seems that there is no way to get an ordered snapshot with strings...

      I don't see an easy way to automatically determine the "natural" result type of an expression, so in the middle term, MozRepl::RemoteObject::Methods::xpath needs to also take the result type as an (optional) parameter. Then, the Firefox ->xpath API can be extended to allow specifying the kind of result.

        The documentation talks about nodes, so it seems that there is no way to get an ordered snapshot with strings...

        If this could help, from previous running C++ code dealing directly with the XPCom layer, we found that, while using the ANY_TYPE:

        • The resulting elements have always been returned (as expected) in the document order (aka a depth first tree walk).
        • Even if not predictable at the query time, the exact result type, is driven by the expression query elements (btw, it would be nice to have it returned, to prevent recomputing it from query analysis).

        ____
        HTH, Dominique
        My two favorites:
        If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow
        Bien faire, et le faire savoir...

Re: Some issues with WWW::Mechanize::Firefox->xpath() method
by Loops (Hermit) on Apr 02, 2013 at 12:10 UTC
    Even:
    @ret = $mech->xpath('substring("hello",1,4)', { single => 1 } )
    Returns an empty array. It may be that you simply have to return the full text node to perl and then perform the substring in perl code.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1026644]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2014-09-21 04:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (166 votes), past polls