Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

How to process each node in an HTML page

by nysus (Vicar)
on Apr 05, 2019 at 07:13 UTC ( #1232178=perlquestion: print w/replies, xml ) Need Help??

nysus has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to analyze an HTML document to determine the elements that are the visually widest elements. I want to visit each node on the page and determine its width, using the element_coordinates method. The method takes a css selector as an argument. So I'm looking for a way to generate css selectors for each node, similar to the way the developer tools in browser's do that.

First question is, can this be done through Mechanize::Chome over port 9222? I'm guessing I would need to learn how to send queries directly through the $mech object over the transport layer. If this is possible, any details would be appreciated.

If that won't work, how can I generate a unique css selector for each node. My initial thought process was:

  1. Take the HTML content and throw it into a tree.
  2. Recurse over the tree and generate an XML element for each node in the tree.
  3. Use the xml elements to construct a unique xpath for each node (I'm not sure how to do this)
  4. Finally, convert each xpath to a selector. I'm not sure how to do this either.

Any help is appreciated. Thanks.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: How to processing each node in an HTML page
by marto (Archbishop) on Apr 05, 2019 at 08:34 UTC

      Ah, good call. I had forgotten about that feature of WMC. This would be using the $mech->eval method.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1232178]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2019-06-26 00:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (108 votes). Check out past polls.

    Notices?