Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: XML::Twig parsing poorly structured content

by slugger415 (Monk)
on Jan 25, 2017 at 00:36 UTC ( [id://1180255]=note: print w/replies, xml ) Need Help??


in reply to Re: XML::Twig parsing poorly structured content
in thread XML::Twig parsing poorly structured content

Looks very nice, thank you! and it works, at least for my sample XML.

I'm not familiar with this handler construction:

'div[@class="event"]'

It looks rather XSL-ish. Is there some explanation of how that works? The reason I ask (sheepishly) is that my pseudo XML is simpler than the real stuff, meaning it has sub-levels that I want to parse, e.g.:

<h3 class="current-day">Thursday, February 2</h3> <div class="event"> <div class="title">Event 1</div> <span class="time">7:30pm</span> <span class="location">Main Street</span> </div> <div class="event"> <div class="title">Event 2</div> <span class="time">9pm</span> <span class="location">Green Street</span> </div>

Sorry not to be more detailed in my original post. Much appreciated.

Replies are listed 'Best First'.
Re^3: XML::Twig parsing poorly structured content
by choroba (Cardinal) on Jan 25, 2017 at 08:26 UTC
    > rather XSL-ish

    It's called XPath. It's used and supported in a wider range of tools/languages/libraries than just XSL. This particular expression means "a div element whose class attribute has the value "event".

    > want to parse

    Then you can't use handlers, as you need access to more than just a subtree. The following shows how to do it. Using XML::LibXML would simplify the code in such a case, in my opinion.

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use XML::Twig; my $twig = 'XML::Twig'->new; $twig->parsefile(shift); my $root = $twig->root; for my $header($root->descendants('h3')) { my $date = $header->text; my @events = $header->next_siblings(sub { my ($elt) = @_; 'div' eq $elt->name && $elt->prev_sibling('h3') == $header }); say join "\t", $date, map $_->text, $_->children for @events; }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re^3: XML::Twig parsing poorly structured content
by kcott (Archbishop) on Jan 25, 2017 at 08:21 UTC
    "I'm not familiar with this handler construction: 'div[@class="event"]'"

    Here's the current W3C Recommendation: "XML Path Language (XPath) 2.0 (Second Edition)".

    In almost all cases, I find the "3.2.4 Abbreviated Syntax" section adequate for my needs. This has a description of 'div[@class="event"]' (as para[@type="warning"]); and lots more besides.

    — Ken

      Thank you Ken! very useful (and more to learn, as always).

      Scott

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1180255]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-03-28 18:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found