Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Mechanize to TreeBuilder-XPath

by Ricalsin (Initiate)
on Jan 26, 2012 at 19:01 UTC ( #950196=perlquestion: print w/ replies, xml ) Need Help??
Ricalsin has asked for the wisdom of the Perl Monks concerning the following question:

I've read Mechanize, TreeBuilder, Elements and XPath in order to grab contents from Web pages.

I've been able to login and navigate effectively using Mechanize; eventually doing $mech->content(); on the page I want; however, that produces a poorly formatted (one long line) of html code. Yes, I can effectively use TreeBuilder and TreeBuilder-XPath on it to find what I'm looking for. I use Mozilla's Firebug to see the html more clearly so that I can go after the content in an XPath manner.

My Question: Is using Firebug an effective approach? I would have thought I could display the html tree structure using something like $mech->content(format=>'html') but that's not a supported format (only 'text'). I'm just trying to lay out the tree so I can more effectively write the XPath to get what I want.

It seems I'm missing something painfully obvious. Can I get some forgiveness and compassion at the Pearly Gates? :)

Comment on Mechanize to TreeBuilder-XPath
Select or Download Code
Re: Mechanize to TreeBuilder-XPath
by Anonymous Monk on Jan 26, 2012 at 19:13 UTC
      Can you elaborate on how these would be used in my situation? It seems html::tidy is used to correct your own html, not format the output of mechanize or TreeBuilder(??).
Re: Mechanize to TreeBuilder-XPath
by kelchris (Novice) on Jan 27, 2012 at 03:30 UTC
    I would suggest using Mojo::UserAgent
    use strict; use Mojo::UserAgent; my $ua = Mojo::UserAgent->new; my $tx = $ua->get("www.amazon.com"); $ua->max_redirects(5); $tx->res->dom('li.navSaChildItem')->each(sub { my $item = $_; print "[CSS3 Selector] Departments: ".$item->a->text."\n"; }); print "Title: ".$tx->res->dom->html->head->title->text."\n";
Re: Mechanize to TreeBuilder-XPath
by kelchris (Novice) on Jan 27, 2012 at 03:37 UTC
    If you want to stick with Mechanize then use use Mojo::DOM; to parse that content you got from $mech and convert it to DOM
    use Mojo::DOM; my $dom = Mojo::DOM->new($mech->content()); print "Title: ".$dom->html->head->title;
Re: Mechanize to TreeBuilder-XPath
by Anonymous Monk on Jan 28, 2012 at 00:39 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://950196]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2015-07-04 23:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls