Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: XPath to XML (xpath2html)

by Anonymous Monk
on May 17, 2013 at 18:42 UTC ( #1034008=note: print w/ replies, xml ) Need Help??


in reply to XPath to XML

yup, see xsh, see site:perlmonks.org "by choroba" xsh, ?node_id=3989;BIT=xml%3A%3Axsh,

Re: XML::Simple usage question (steriods), Re^2: Extracting data-structure from HTML using Web::Scraper, Re: How to read onclick properties on row of a table using HTML::Table::Extractor ( DOM approach using xsh)

Found 50 nodes roughly between 2013-05-17 and 2011-09-27 (searched 10.24% of DB).


where any text contains "xml::xsh"

2013-04-04 choroba Re: LibXML: Change a node into a comment Re:SoPW
2013-04-02 Anonymous Monk Re^3: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) Re:SoPW
2013-04-02 choroba Re^2: Some issues with WWW::Mechanize::Firefox->xpath() method (xpath 1.0) Re:SoPW
2013-03-21 choroba Re: LibXML, XPath and Namespaces Re:SoPW
2013-01-11 choroba Re: Perl - Modify the nested XML tags Re:SoPW
2013-01-07 choroba Re: to get next line of pattern matched Re:SoPW
2013-01-07 choroba Re: regular expression Re:SoPW
2012-12-14 choroba Re: Creating nested elements in XML::Smart Re:SoPW
2012-12-14 choroba Re: Fetch field values from API output Re:SoPW
2012-12-06 choroba Re: From a given text Extract the root HTML element inner text Re:SoPW
2012-11-22 choroba Re: Finding max value from a unique tag from XML Re:SoPW
2012-11-21 choroba Re: Adding Elements to XML Re:SoPW
2012-11-20 choroba Re: hi i want to retrieve the element and values from xml document Re:SoPW
2012-11-16 choroba Re: XML Newbie Re:SoPW
2012-10-16 sundialsvc4 Re: Search Entire Excel Workbook For Text Re:SoPW
2012-10-16 choroba Re: Some questions from beginning user of XML::LibXML and XPath Re:SoPW
2012-10-12 choroba Re^2: XML::Simple XML / XMLin / XMLout? or something else? Re:SoPW
2012-10-05 choroba Re: Struggling with XML Re:SoPW
2012-05-28 choroba Re: XML - Escaping characters from database for XML Re:SoPW
2012-05-25 choroba Re: Need help for Xpath patterns Re:SoPW
2012-04-23 choroba Re: finding each and every node of a xml document Re:SoPW
2012-04-19 choroba Re: searching a empty XML tag or self enclosing tags Re:SoPW
2012-03-01 choroba Re: Remove level of elements (preserving their children) in XML::Twig? Re:SoPW
2012-02-14 choroba Re: libxml - insert node Re:SoPW
2012-02-10 choroba Re: conditional input field separator? Re:SoPW
2012-01-26 choroba Re^2: parsing multi level XML with XML::Simple Re:SoPW
2012-01-15 choroba Re: Is there any XML reader like this? Re:SoPW
2012-01-15 trwww Re^2: RFC iEngine Re:Med
2012-01-13 repellent Re: RFC iEngine Re:Med
2012-01-12 choroba Re: Building a tree 1 leaf at a time Re:SoPW
2011-12-19 choroba Re: XML::Twig - Using xpath with twig roots Re:SoPW
2011-12-09 choroba Re: how to get attribute values and store in a hash. Re:SoPW
2011-11-30 choroba Re: compacting XML? Re:SoPW
2011-11-18 choroba Re: Modify XML tags Re:SoPW
2011-11-18 choroba Re: Multiple XML files from Directory to One XML file using perl. Re:SoPW
2011-11-14 choroba Re: Match on line, read backwards to opening xml tag then forward to closing tag Re:SoPW
2011-11-04 choroba Re: help me with perl script that add xml attritutes Re:SoPW
2011-11-01 vagabonding electron Re^6: How to get paired values from the nested XML structure? Re:SoPW
2011-11-01 marto Re^5: How to get paired values from the nested XML structure? Re:SoPW
2011-11-01 marto Re^3: How to get paired values from the nested XML structure? Re:SoPW
2011-11-01 vagabonding electron Re^2: How to get paired values from the nested XML structure? Re:SoPW
2011-11-01 choroba Re: How to get paired values from the nested XML structure? Re:SoPW
2011-10-12 choroba Re: How to get empty tag value in XML::XPath Re:SoPW
2011-10-07 choroba Re: Problem in String Replacement Re:SoPW
2011-10-04 veerubiji Re^2: perl script to print xml data like this Re:SoPW
2011-10-04 choroba Re: perl script to print xml data like this Re:SoPW
2011-09-30 choroba Re: Hash Table genaration using perl Re:SoPW
2011-09-29 choroba Re: How can I replace a line (tag) in an XML file? Re:SoPW
2011-09-27 choroba Re: XML::Twig removing tags from content Re:SoPW
2011-09-27 choroba Re: Replacing XML Tag name with another Re:SoPW

Found 20 nodes roughly between 2011-09-27 and 2008-12-04 (searched 19.35% of DB).


where any text contains "xml::xsh"

2011-08-26 Anonymous Monk Re: Search and replace query Re:SoPW
2011-08-19 choroba Re: Question about XML::DOM::Lite Re:SoPW
2011-07-15 Anonymous Monk Re: XML::LibXML - WHAR HASH TREES WHAR?! Re:SoPW
2011-06-25 choroba Re: How do I ignore comments in an xml file when using win32::ole? Re:SoPW
2011-06-07 choroba Re: Group XML Re:SoPW
2011-05-10 choroba Re: Replace MathML content using Twig Re:SoPW
2011-05-03 choroba Re: Missing values in XML::Twig Output Re:SoPW
2011-04-21 choroba Re: How do I get LibXML to replace attribute values? Re:SoPW
2011-04-12 choroba Re: Changing XML Tag value in Perl Script Re:SoPW
2011-04-12 choroba Re: perl xpath extraction Re:SoPW
2011-04-05 choroba Re: Modify XML Re:SoPW
2011-03-25 choroba Re: regex replace using position loop Re:SoPW
2010-12-03 choroba Re: XPathing Up level Re:SoPW
2010-11-11 choroba Re: XML::XPath - node-to-xpath reverse lookup Re:SoPW
2010-10-29 choroba Re: XPath command line utility... Re:SoPW
2010-10-26 choroba Re: Perl and Lib::XML usage Re:SoPW
2010-10-22 choroba Re: Having problems accessing individual attributes in xml Re:SoPW
2010-09-14 choroba Re: Deleting XML element using XML::LibXML Re:SoPW
2010-08-12 choroba Re: Sort xml based on attribute Re:SoPW
2010-08-05 choroba Re: How I can change value in XML file ? Re:SoPW

Found 2 nodes roughly between 2008-12-04 and 2006-03-20 (searched 18.38% of DB).


where any text contains "xml::xsh"

2007-03-05 merlyn Re: XML gurus unite!! Re:SoPW
2007-02-13 merlyn Get most recently refreshed CPAN mirror in your country Snippet

Found 7 nodes roughly between 2006-03-20 and 2005-03-09 (searched 9.67% of DB).


where any text contains "xml::xsh"

2005-10-13 saintmike Re: html analysis tool via regex Re:SoPW
2005-08-22 merlyn Re: Looking for a XPATH-like tool for HTML documents Re:SoPW
2005-05-20 rg0now Re^3: xQuery functionality in Perl? Re:SoPW
2005-05-11 rg0now Re: XML Parsing Suggestions? Re:SoPW
2005-05-08 merlyn Re: Replacing everything in between using s///; Re:SoPW
2005-04-08 rg0now Re: Parsing XML/HTML Re:SoPW
2005-03-21 merlyn Re: XML::Simple "transforming data" Re:SoPW

Found 6 nodes roughly between 2005-03-09 and 2003-06-22 (searched 16.44% of DB).


where any text contains "xml::xsh"

2004-08-26 merlyn •Re: Delete from string through s/// Re:SoPW
2004-06-01 ambrus ambrus's scratchpad SPad
2004-04-12 merlyn •Re: Just use an XSLT stylesheet Re:Med
2003-10-27 princepawn HTML Templating as Tree Rewriting: Part I: "If Statements" Med
2003-10-22 merlyn Screen-scraping using XSH - O'Reilly Animal lister Code
2003-08-18 merlyn •Re: XML::XPath Re:SoPW

Found 1 node roughly between 2003-06-22 and 2001-06-13 (searched 17.41% of DB).


where any text contains "xml::xsh"

2002-12-01 larsen Re: tgrep - A grep for XML/HTML tags Re:Code

and see xpath2html. I toyed with it a few years ago, couldn't make a go of it with XML::XPathEngine but this worked about as well as I needed. You can ditch HTML::Element for proper DOM api like XML::LibXML::Element

xpath2html

#!/usr/bin/perl -- use strict; use warnings; use HTML::Element; { my $root; my $current; for my $step ( grep length, split '/', q!/html/body/div[@id='wrapper']/div[@id='outer']/div[@id='inner']/div[ +@id='center']/div[@id='main']/div[2]/table[@id='wrappedcontent']/tbod +y/tr/td/table/tbody/tr[2]/td[2]!, ) { my ( $tag, $att ) = $step =~ /^([^\[]+)\[?(.*?)\]?$/; warn "step($step)tag($tag)att($att) \n"; if ( $current and $root ) { if ( $att =~ /^\d+$/ ) { my $new; for my $n ( 1 .. $att ) { $new = HTML::Element->new($tag, ncount => $n ); $current->push_content($new); } $current = $new; } elsif( $att =~/\@(\w+)(?:[^=]*=['"]*([^'"]+)['"]*)?$/ ) +{ my $new = HTML::Element->new($tag, $1 => $2 ); $current->push_content($new); $current = $new; } else { my $new = HTML::Element->new($tag); $current->push_content($new); $current = $new; } } else { $root = HTML::Element->new( $tag ); $current = $root; } } undef $current; print $root->as_HTML( '><&' => " " ); $root->delete; undef $root; } __END__ step(html)tag(html)att() step(body)tag(body)att() step(div[@id='wrapper'])tag(div)att(@id='wrapper') step(div[@id='outer'])tag(div)att(@id='outer') step(div[@id='inner'])tag(div)att(@id='inner') step(div[@id='center'])tag(div)att(@id='center') step(div[@id='main'])tag(div)att(@id='main') step(div[2])tag(div)att(2) step(table[@id='wrappedcontent'])tag(table)att(@id='wrappedcontent') step(tbody)tag(tbody)att() step(tr)tag(tr)att() step(td)tag(td)att() step(table)tag(table)att() step(tbody)tag(tbody)att() step(tr[2])tag(tr)att(2) step(td[2])tag(td)att(2) <html> <body> <div id="wrapper"> <div id="outer"> <div id="inner"> <div id="center"> <div id="main"> <div ncount="1"> </div> <div ncount="2"> <table id="wrappedcontent"> <tbody> <tr> <td> <table> <tbody> <tr ncount="1" +> </tr> <tr ncount="2" +> <td ncount +="1"> </td> <td ncount +="2"> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </div> </div> </div> </div> </div> </div> </body> </html>


Comment on Re: XPath to XML (xpath2html)
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1034008]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2014-12-27 00:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls