Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Mechanize to TreeBuilder-XPath

by Ricalsin (Initiate)
on Jan 26, 2012 at 19:01 UTC ( #950196=perlquestion: print w/ replies, xml ) Need Help??
Ricalsin has asked for the wisdom of the Perl Monks concerning the following question:

I've read Mechanize, TreeBuilder, Elements and XPath in order to grab contents from Web pages.

I've been able to login and navigate effectively using Mechanize; eventually doing $mech->content(); on the page I want; however, that produces a poorly formatted (one long line) of html code. Yes, I can effectively use TreeBuilder and TreeBuilder-XPath on it to find what I'm looking for. I use Mozilla's Firebug to see the html more clearly so that I can go after the content in an XPath manner.

My Question: Is using Firebug an effective approach? I would have thought I could display the html tree structure using something like $mech->content(format=>'html') but that's not a supported format (only 'text'). I'm just trying to lay out the tree so I can more effectively write the XPath to get what I want.

It seems I'm missing something painfully obvious. Can I get some forgiveness and compassion at the Pearly Gates? :)

Comment on Mechanize to TreeBuilder-XPath
Select or Download Code
Re: Mechanize to TreeBuilder-XPath
by Anonymous Monk on Jan 26, 2012 at 19:13 UTC
      Can you elaborate on how these would be used in my situation? It seems html::tidy is used to correct your own html, not format the output of mechanize or TreeBuilder(??).
Re: Mechanize to TreeBuilder-XPath
by kelchris (Novice) on Jan 27, 2012 at 03:30 UTC
    I would suggest using Mojo::UserAgent
    use strict; use Mojo::UserAgent; my $ua = Mojo::UserAgent->new; my $tx = $ua->get("www.amazon.com"); $ua->max_redirects(5); $tx->res->dom('li.navSaChildItem')->each(sub { my $item = $_; print "[CSS3 Selector] Departments: ".$item->a->text."\n"; }); print "Title: ".$tx->res->dom->html->head->title->text."\n";
Re: Mechanize to TreeBuilder-XPath
by kelchris (Novice) on Jan 27, 2012 at 03:37 UTC
    If you want to stick with Mechanize then use use Mojo::DOM; to parse that content you got from $mech and convert it to DOM
    use Mojo::DOM; my $dom = Mojo::DOM->new($mech->content()); print "Title: ".$dom->html->head->title;
Re: Mechanize to TreeBuilder-XPath
by Anonymous Monk on Jan 28, 2012 at 00:39 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://950196]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2014-12-19 03:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (70 votes), past polls