in reply to XPath to XML
yup, see xsh, see site:perlmonks.org "by choroba" xsh, ?node_id=3989;BIT=xml%3A%3Axsh,
Re: XML::Simple usage question (steriods), Re^2: Extracting data-structure from HTML using Web::Scraper, Re: How to read onclick properties on row of a table using HTML::Table::Extractor ( DOM approach using xsh)
Found 50 nodes roughly between 2013-05-17 and 2011-09-27 (searched 10.24% of DB).
where any text contains "xml::xsh"
Found 20 nodes roughly between 2011-09-27 and 2008-12-04 (searched 19.35% of DB).
where any text contains "xml::xsh"
Found 2 nodes roughly between 2008-12-04 and 2006-03-20 (searched 18.38% of DB).
where any text contains "xml::xsh"
2007-03-05 | merlyn | Re: XML gurus unite!! | Re:SoPW |
2007-02-13 | merlyn | Get most recently refreshed CPAN mirror in your country | Snippet |
Found 7 nodes roughly between 2006-03-20 and 2005-03-09 (searched 9.67% of DB).
where any text contains "xml::xsh"
2005-10-13 | saintmike | Re: html analysis tool via regex | Re:SoPW |
2005-08-22 | merlyn | Re: Looking for a XPATH-like tool for HTML documents | Re:SoPW |
2005-05-20 | rg0now | Re^3: xQuery functionality in Perl? | Re:SoPW |
2005-05-11 | rg0now | Re: XML Parsing Suggestions? | Re:SoPW |
2005-05-08 | merlyn | Re: Replacing everything in between using s///; | Re:SoPW |
2005-04-08 | rg0now | Re: Parsing XML/HTML | Re:SoPW |
2005-03-21 | merlyn | Re: XML::Simple "transforming data" | Re:SoPW |
Found 6 nodes roughly between 2005-03-09 and 2003-06-22 (searched 16.44% of DB).
where any text contains "xml::xsh"
2004-08-26 | merlyn | •Re: Delete from string through s/// | Re:SoPW |
2004-06-01 | ambrus | ambrus's scratchpad | SPad |
2004-04-12 | merlyn | •Re: Just use an XSLT stylesheet | Re:Med |
2003-10-27 | princepawn | HTML Templating as Tree Rewriting: Part I: "If Statements" | Med |
2003-10-22 | merlyn | Screen-scraping using XSH - O'Reilly Animal lister | Code |
2003-08-18 | merlyn | •Re: XML::XPath | Re:SoPW |
Found 1 node roughly between 2003-06-22 and 2001-06-13 (searched 17.41% of DB).
where any text contains "xml::xsh"
2002-12-01 | larsen | Re: tgrep - A grep for XML/HTML tags | Re:Code |
and see xpath2html. I toyed with it a few years ago, couldn't make a go of it with XML::XPathEngine but this worked about as well as I needed. You can ditch HTML::Element for proper DOM api like XML::LibXML::Element
xpath2html
#!/usr/bin/perl -- use strict; use warnings; use HTML::Element; { my $root; my $current; for my $step ( grep length, split '/', q!/html/body/div[@id='wrapper']/div[@id='outer']/div[@id='inner']/div[ +@id='center']/div[@id='main']/div[2]/table[@id='wrappedcontent']/tbod +y/tr/td/table/tbody/tr[2]/td[2]!, ) { my ( $tag, $att ) = $step =~ /^([^\[]+)\[?(.*?)\]?$/; warn "step($step)tag($tag)att($att) \n"; if ( $current and $root ) { if ( $att =~ /^\d+$/ ) { my $new; for my $n ( 1 .. $att ) { $new = HTML::Element->new($tag, ncount => $n ); $current->push_content($new); } $current = $new; } elsif( $att =~/\@(\w+)(?:[^=]*=['"]*([^'"]+)['"]*)?$/ ) +{ my $new = HTML::Element->new($tag, $1 => $2 ); $current->push_content($new); $current = $new; } else { my $new = HTML::Element->new($tag); $current->push_content($new); $current = $new; } } else { $root = HTML::Element->new( $tag ); $current = $root; } } undef $current; print $root->as_HTML( '><&' => " " ); $root->delete; undef $root; } __END__ step(html)tag(html)att() step(body)tag(body)att() step(div[@id='wrapper'])tag(div)att(@id='wrapper') step(div[@id='outer'])tag(div)att(@id='outer') step(div[@id='inner'])tag(div)att(@id='inner') step(div[@id='center'])tag(div)att(@id='center') step(div[@id='main'])tag(div)att(@id='main') step(div[2])tag(div)att(2) step(table[@id='wrappedcontent'])tag(table)att(@id='wrappedcontent') step(tbody)tag(tbody)att() step(tr)tag(tr)att() step(td)tag(td)att() step(table)tag(table)att() step(tbody)tag(tbody)att() step(tr[2])tag(tr)att(2) step(td[2])tag(td)att(2) <html> <body> <div id="wrapper"> <div id="outer"> <div id="inner"> <div id="center"> <div id="main"> <div ncount="1"> </div> <div ncount="2"> <table id="wrappedcontent"> <tbody> <tr> <td> <table> <tbody> <tr ncount="1" +> </tr> <tr ncount="2" +> <td ncount +="1"> </td> <td ncount +="2"> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </div> </div> </div> </div> </div> </div> </body> </html>
In Section
Seekers of Perl Wisdom