Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: XPath to XML (xpath2html)

by Anonymous Monk
on May 17, 2013 at 18:42 UTC ( #1034008=note: print w/ replies, xml ) Need Help??


in reply to XPath to XML

yup, see xsh, see site:perlmonks.org "by choroba" xsh, ?node_id=3989;BIT=xml%3A%3Axsh,

Re: XML::Simple usage question (steriods), Re^2: Extracting data-structure from HTML using Web::Scraper, Re: How to read onclick properties on row of a table using HTML::Table::Extractor ( DOM approach using xsh)

Found 50 nodes roughly between 2013-05-17 and 2011-09-27 (searched 10.24% of DB).


where any text contains "xml::xsh"

2013-04-04 choroba Re: LibXML: Change a node into a comment Re:SoPW
2013-04-02 Anonymous Monk Re^3: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) Re:SoPW
2013-04-02 choroba Re^2: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) Re:SoPW
2013-03-21 choroba Re: LibXML, XPath and Namespaces Re:SoPW
2013-01-11 choroba Re: Perl - Modify the nested XML tags Re:SoPW
2013-01-07 choroba Re: to get next line of pattern matched Re:SoPW
2013-01-07 choroba Re: regular expression Re:SoPW
2012-12-14 choroba Re: Creating nested elements in XML::Smart Re:SoPW
2012-12-14 choroba Re: Fetch field values from API output Re:SoPW
2012-12-06 choroba Re: From a given text Extract the root HTML element inner text Re:SoPW
2012-11-22 choroba Re: Finding max value from a unique tag from XML Re:SoPW
2012-11-21 choroba Re: Adding Elements to XML Re:SoPW
2012-11-20 choroba Re: hi i want to retrieve the element and values from xml document Re:SoPW
2012-11-16 choroba Re: XML Newbie Re:SoPW
2012-10-16 sundialsvc4 Re: Search Entire Excel Workbook For Text Re:SoPW
2012-10-16 choroba Re: Some questions from beginning user of XML::LibXML and XPath Re:SoPW
2012-10-12 choroba Re^2: XML::Simple XML / XMLin / XMLout? or something else? Re:SoPW
2012-10-05 choroba Re: Struggling with XML Re:SoPW
2012-05-28 choroba Re: XML - Escaping characters from database for XML Re:SoPW
2012-05-25 choroba Re: Need help for Xpath patterns Re:SoPW
2012-04-23 choroba Re: finding each and every node of a xml document Re:SoPW
2012-04-19 choroba Re: searching a empty XML tag or self enclosing tags Re:SoPW
2012-03-01 choroba Re: Remove level of elements (preserving their children) in XML::Twig? Re:SoPW
2012-02-14 choroba Re: libxml - insert node Re:SoPW
2012-02-10 choroba Re: conditional input field separator? Re:SoPW
2012-01-26 choroba Re^2: parsing multi level XML with XML::Simple Re:SoPW
2012-01-15 choroba Re: Is there any XML reader like this? Re:SoPW
2012-01-15 trwww Re^2: RFC iEngine Re:Med
2012-01-13 repellent Re: RFC iEngine Re:Med
2012-01-12 choroba Re: Building a tree 1 leaf at a time Re:SoPW
2011-12-19 choroba Re: XML::Twig - Using xpath with twig roots Re:SoPW
2011-12-09 choroba Re: how to get attribute values and store in a hash. Re:SoPW
2011-11-30 choroba Re: compacting XML? Re:SoPW
2011-11-18 choroba Re: Modify XML tags Re:SoPW
2011-11-18 choroba Re: Multiple XML files from Directory to One XML file using perl. Re:SoPW
2011-11-14 choroba Re: Match on line, read backwards to opening xml tag then forward to closing tag Re:SoPW
2011-11-04 choroba Re: help me with perl script that add xml attritutes Re:SoPW
2011-11-01 vagabonding electron Re^6: How to get paired values from the nested XML structure? Re:SoPW
2011-11-01 marto Re^5: How to get paired values from the nested XML structure? Re:SoPW
2011-11-01 marto Re^3: How to get paired values from the nested XML structure? Re:SoPW
2011-11-01 vagabonding electron Re^2: How to get paired values from the nested XML structure? Re:SoPW
2011-11-01 choroba Re: How to get paired values from the nested XML structure? Re:SoPW
2011-10-12 choroba Re: How to get empty tag value in XML::XPath Re:SoPW
2011-10-07 choroba Re: Problem in String Replacement Re:SoPW
2011-10-04 veerubiji Re^2: perl script to print xml data like this Re:SoPW
2011-10-04 choroba Re: perl script to print xml data like this Re:SoPW
2011-09-30 choroba Re: Hash Table genaration using perl Re:SoPW
2011-09-29 choroba Re: How can I replace a line (tag) in an XML file? Re:SoPW
2011-09-27 choroba Re: XML::Twig removing tags from content Re:SoPW
2011-09-27 choroba Re: Replacing XML Tag name with another Re:SoPW

Found 20 nodes roughly between 2011-09-27 and 2008-12-04 (searched 19.35% of DB).


where any text contains "xml::xsh"

2011-08-26 Anonymous Monk Re: Search and replace query Re:SoPW
2011-08-19 choroba Re: Question about XML::DOM::Lite Re:SoPW
2011-07-15 Anonymous Monk Re: XML::LibXML - WHAR HASH TREES WHAR?! Re:SoPW
2011-06-25 choroba Re: How do I ignore comments in an xml file when using win32::ole? Re:SoPW
2011-06-07 choroba Re: Group XML Re:SoPW
2011-05-10 choroba Re: Replace MathML content using Twig Re:SoPW
2011-05-03 choroba Re: Missing values in XML::Twig Output Re:SoPW
2011-04-21 choroba Re: How do I get LibXML to replace attribute values? Re:SoPW
2011-04-12 choroba Re: Changing XML Tag value in Perl Script Re:SoPW
2011-04-12 choroba Re: perl xpath extraction Re:SoPW
2011-04-05 choroba Re: Modify XML Re:SoPW
2011-03-25 choroba Re: regex replace using position loop Re:SoPW
2010-12-03 choroba Re: XPathing Up level Re:SoPW
2010-11-11 choroba Re: XML::XPath - node-to-xpath reverse lookup Re:SoPW
2010-10-29 choroba Re: XPath command line utility... Re:SoPW
2010-10-26 choroba Re: Perl and Lib::XML usage Re:SoPW
2010-10-22 choroba Re: Having problems accessing individual attributes in xml Re:SoPW
2010-09-14 choroba Re: Deleting XML element using XML::LibXML Re:SoPW
2010-08-12 choroba Re: Sort xml based on attribute Re:SoPW
2010-08-05 choroba Re: How I can change value in XML file ? Re:SoPW

Found 2 nodes roughly between 2008-12-04 and 2006-03-20 (searched 18.38% of DB).


where any text contains "xml::xsh"

2007-03-05 merlyn Re: XML gurus unite!! Re:SoPW
2007-02-13 merlyn Get most recently refreshed CPAN mirror in your country Snippet

Found 7 nodes roughly between 2006-03-20 and 2005-03-09 (searched 9.67% of DB).


where any text contains "xml::xsh"

2005-10-13 saintmike Re: html analysis tool via regex Re:SoPW
2005-08-22 merlyn Re: Looking for a XPATH-like tool for HTML documents Re:SoPW
2005-05-20 rg0now Re^3: xQuery functionality in Perl? Re:SoPW
2005-05-11 rg0now Re: XML Parsing Suggestions? Re:SoPW
2005-05-08 merlyn Re: Replacing everything in between using s///; Re:SoPW
2005-04-08 rg0now Re: Parsing XML/HTML Re:SoPW
2005-03-21 merlyn Re: XML::Simple "transforming data" Re:SoPW

Found 6 nodes roughly between 2005-03-09 and 2003-06-22 (searched 16.44% of DB).


where any text contains "xml::xsh"

2004-08-26 merlyn •Re: Delete from string through s/// Re:SoPW
2004-06-01 ambrus ambrus's scratchpad SPad
2004-04-12 merlyn •Re: Just use an XSLT stylesheet Re:Med
2003-10-27 princepawn HTML Templating as Tree Rewriting: Part I: "If Statements" Med
2003-10-22 merlyn Screen-scraping using XSH - O'Reilly Animal lister Code
2003-08-18 merlyn •Re: XML::XPath Re:SoPW

Found 1 node roughly between 2003-06-22 and 2001-06-13 (searched 17.41% of DB).


where any text contains "xml::xsh"

2002-12-01 larsen Re: tgrep - A grep for XML/HTML tags Re:Code

and see xpath2html. I toyed with it a few years ago, couldn't make a go of it with XML::XPathEngine but this worked about as well as I needed. You can ditch HTML::Element for proper DOM api like XML::LibXML::Element

xpath2html

#!/usr/bin/perl -- use strict; use warnings; use HTML::Element; { my $root; my $current; for my $step ( grep length, split '/', q!/html/body/div[@id='wrapper']/div[@id='outer']/div[@id='inner']/div[ +@id='center']/div[@id='main']/div[2]/table[@id='wrappedcontent']/tbod +y/tr/td/table/tbody/tr[2]/td[2]!, ) { my ( $tag, $att ) = $step =~ /^([^\[]+)\[?(.*?)\]?$/; warn "step($step)tag($tag)att($att) \n"; if ( $current and $root ) { if ( $att =~ /^\d+$/ ) { my $new; for my $n ( 1 .. $att ) { $new = HTML::Element->new($tag, ncount => $n ); $current->push_content($new); } $current = $new; } elsif( $att =~/\@(\w+)(?:[^=]*=['"]*([^'"]+)['"]*)?$/ ) +{ my $new = HTML::Element->new($tag, $1 => $2 ); $current->push_content($new); $current = $new; } else { my $new = HTML::Element->new($tag); $current->push_content($new); $current = $new; } } else { $root = HTML::Element->new( $tag ); $current = $root; } } undef $current; print $root->as_HTML( '><&' => " " ); $root->delete; undef $root; } __END__ step(html)tag(html)att() step(body)tag(body)att() step(div[@id='wrapper'])tag(div)att(@id='wrapper') step(div[@id='outer'])tag(div)att(@id='outer') step(div[@id='inner'])tag(div)att(@id='inner') step(div[@id='center'])tag(div)att(@id='center') step(div[@id='main'])tag(div)att(@id='main') step(div[2])tag(div)att(2) step(table[@id='wrappedcontent'])tag(table)att(@id='wrappedcontent') step(tbody)tag(tbody)att() step(tr)tag(tr)att() step(td)tag(td)att() step(table)tag(table)att() step(tbody)tag(tbody)att() step(tr[2])tag(tr)att(2) step(td[2])tag(td)att(2) <html> <body> <div id="wrapper"> <div id="outer"> <div id="inner"> <div id="center"> <div id="main"> <div ncount="1"> </div> <div ncount="2"> <table id="wrappedcontent"> <tbody> <tr> <td> <table> <tbody> <tr ncount="1" +> </tr> <tr ncount="2" +> <td ncount +="1"> </td> <td ncount +="2"> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </div> </div> </div> </div> </div> </div> </body> </html>


Comment on Re: XPath to XML (xpath2html)
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1034008]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2015-07-06 00:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls