Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Mechanize to TreeBuilder-XPath

by Ricalsin (Initiate)
on Jan 26, 2012 at 19:01 UTC ( #950196=perlquestion: print w/replies, xml ) Need Help??
Ricalsin has asked for the wisdom of the Perl Monks concerning the following question:

I've read Mechanize, TreeBuilder, Elements and XPath in order to grab contents from Web pages.

I've been able to login and navigate effectively using Mechanize; eventually doing $mech->content(); on the page I want; however, that produces a poorly formatted (one long line) of html code. Yes, I can effectively use TreeBuilder and TreeBuilder-XPath on it to find what I'm looking for. I use Mozilla's Firebug to see the html more clearly so that I can go after the content in an XPath manner.

My Question: Is using Firebug an effective approach? I would have thought I could display the html tree structure using something like $mech->content(format=>'html') but that's not a supported format (only 'text'). I'm just trying to lay out the tree so I can more effectively write the XPath to get what I want.

It seems I'm missing something painfully obvious. Can I get some forgiveness and compassion at the Pearly Gates? :)

Replies are listed 'Best First'.
Re: Mechanize to TreeBuilder-XPath
by kelchris (Novice) on Jan 27, 2012 at 03:30 UTC
    I would suggest using Mojo::UserAgent
    use strict; use Mojo::UserAgent; my $ua = Mojo::UserAgent->new; my $tx = $ua->get(""); $ua->max_redirects(5); $tx->res->dom('li.navSaChildItem')->each(sub { my $item = $_; print "[CSS3 Selector] Departments: ".$item->a->text."\n"; }); print "Title: ".$tx->res->dom->html->head->title->text."\n";
Re: Mechanize to TreeBuilder-XPath
by Anonymous Monk on Jan 26, 2012 at 19:13 UTC
      Can you elaborate on how these would be used in my situation? It seems html::tidy is used to correct your own html, not format the output of mechanize or TreeBuilder(??).
Re: Mechanize to TreeBuilder-XPath
by kelchris (Novice) on Jan 27, 2012 at 03:37 UTC
    If you want to stick with Mechanize then use use Mojo::DOM; to parse that content you got from $mech and convert it to DOM
    use Mojo::DOM; my $dom = Mojo::DOM->new($mech->content()); print "Title: ".$dom->html->head->title;
Re: Mechanize to TreeBuilder-XPath
by Anonymous Monk on Jan 28, 2012 at 00:39 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://950196]
Approved by GrandFather
marioroy running LWP::Simple on Cygwin in parallel completes in 1 second. Pre-loading essential modules is necessary only on MSWin32 Re: Crash with ForkManager on Windows.
marioroy LWP::Simple works best with Perl v5.26 on Windows. Any Perl version is fine on Cygwin and Unix.
[karlgoethebier]: Hi mario
[marioroy]: Hi karlgoethebier.
[karlgoethebier]: "Long time No See" ;-)

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2017-09-23 15:49 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (272 votes). Check out past polls.