Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Webpage Element Information

by artist (Parson)
on Nov 14, 2007 at 18:00 UTC ( #650811=perlquestion: print w/replies, xml ) Need Help??

artist has asked for the wisdom of the Perl Monks concerning the following question:

There is a Firefox extension, WebDeveloper. It has one menu item, outline current element. It shows position of element. Example:
"html > body #id-479 > center > table > tbody > tr > td .main_content > form > table > tbody > tr > td > ul" 
for an element in this page. Now, I like to write this type of information into a file, when I highlight an element on the web browser. I like to do this with Perl. Where should I begin?



Replies are listed 'Best First'.
Re: Webpage Element Information
by jZed (Prior) on Nov 14, 2007 at 18:52 UTC
    The name of what you are trying to capture is DOM (Document Object Model). You can access it in Perl via one of the HTML parsers, e.g. HTML::TreeBuilder. You can access it natively in JavaScript. So either grab the file and parse it with Perl or use JavaScript to send the DOM tree as a JSON string to a Perl script and use one of the CPAN JSON modules to turn it into a Perl data structure.
Re: Webpage Element Information
by erroneousBollock (Curate) on Nov 14, 2007 at 23:00 UTC
    There is a Firefox extension, WebDeveloper. It has one menu item, outline current element. It shows position of element. Example:
    "html > body #id-479 > center > table > tbody > tr > td .main_content > form > table > tbody > tr > td > ul[0]" 
    for an element in this page.
    Firstly, please note that that is not a unique position. If you want to be able to find what was selected/clicked later, you need to use an XPath-like notation; something like:

    "html > body#id-479 > center[0] > table[0] > tbody [0] > tr[1] > td.main_content > form[0] > table[0] > tbody[0] > tr[0] > td[0] > ul"

    Secondly, roughly the same question was asked by user2000 recently in Position on Web Page. The conclusions are:

    • You need to know Javascript and the W3C DOM very well to figure this out.
    • There are many problems doing this across different browsers.
    • It's 100x harder if the user can click a TextElement (rather than just Type3 DOM nodes)
    • It's 1000x harder if the user can make a text selection spanning multiple DOM nodes.

    And that's just the content-targeting part. The remainder is communicating the "path" string to your perl back-end via AJAX (XMLHttpRequest, etc).


    I see that like YourMother, I assumed you'd want to remember the XPath-like position in order to translate it to a (browser-relative) pixel-position.

    If that's not the case, then you can start with something like YourMother has written below (using jquery, or some other library that smooths over browser differences).

    To be accurate, you will need to form an XPath-like expression, so why not use real XPath?

    There's a cross-platform library available here.

    To construct the XPath expression, you'll have to walk up the DOM from the target element to the document.body (root). At each level, you'll need to determine 'id', 'class' and sibling-rank (with that selection priority).


Re: Webpage Element Information
by Your Mother (Archbishop) on Nov 15, 2007 at 03:36 UTC

    So, this is a bit tougher than I thought but I'm only giving up early b/c I only had an hour to look at it. I believe the following approach will get you there if you're willing to bang on it and solve the click depth problem. The missing parts aren't hard I think, I just don't know them or have time to figure them out tonight.

    First the JS.

      Second, clicks propagate to parents. If you click on a link it will also receive the click for the paragraph the link is in, the div the paragraph is in, the body the div is in, and the html the body is in. I don't know how to solve that but the jQuery list is very friendly, I'm sure someone does.
      The event will only have a single target attribute, so you can always test for equality in the handler. When you do handle the event at the target node, you can stop propagation of the event like so:

      [event].cancelBubble = true; if ([event].stopPropagation) [event].stopPropagation();


        That's great and works on Safari and FF at least. Amended jQuery block.

        // <![CDATA[ $(document).ready(function() { $("*").bind("click", function(evt){ var info = $(this).position(); info["height"] = $(this).height(); info["width"] = $(this).width(); info["lineage"] = $(this).parents() .map(function () { return this.tagName; }) .get().reverse().join(" > "); $.ajax({url:"/cgi/" ,data:info ,dataType:"json" ,type:"POST" ,success: function(data){ $("#tag").append(data.status + " ") } }); evt.cancelBubble = true; // NEW! if (evt.stopPropagation) evt.stopPropagation(); }); }); // ]]>
Re: Webpage Element Information
by Your Mother (Archbishop) on Nov 14, 2007 at 21:48 UTC

    If I understand you correctly, this is not currently possible with Perl (though I don't know the Win32 stuff well enough to be sure; it has some good hooks into IE and JS IIRC). What you probably want is to write your own Firefox extension in JS and XPCOM. They have lots of docs including this one for IO stuff.

      While the web-developer plugin is a nice tool, it seems a rather round-about way to get the DOM structure into a a file or script. Why would someone want to do it that way instead of just using existing tools in JavaScript or Perl to get the same information?

        I took position (mistakenly) to mean X/Y which is available out of the broswer only and he wants to write it to a file in response to a mouse action. So I thought of chrome. Upon reflection though (thanks for calling me on it) there is obviously a Perl/JS solution. I'm playing around with one right now, Ajax. If I can finish something concise and correct I'll follow up with it.

Re: Webpage Element Information
by Anonymous Monk on Feb 26, 2008 at 19:04 UTC

    This is a bit late to be back to this but I thought of it while playing with something else. I really like the jQuery chaining and the vertical-style notation makes it quite easy to read. stopPropagation() is normalized in jQuery and should work on any modern browser.

    <script type="text/javascript" src="/your/path/to/jquery.js"> </script> <script type="text/javascript"> $(function(){ $("*").click(function(evt){ var lineage = $(this) .parents() .get() .reverse() .map(function(n){return n.nodeName.toLowerCase()}) .join(" > "); evt.stopPropagation(); alert(lineage); }); });

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://650811]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2021-04-15 06:36 GMT
Find Nodes?
    Voting Booth?

    No recent polls found