Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

How to simplify the datastructure returned by XML::Twig

by davido (Archbishop)
on Nov 06, 2005 at 07:23 UTC ( #506086=perlquestion: print w/ replies, xml ) Need Help??
davido has asked for the wisdom of the Perl Monks concerning the following question:

I've been puzzling over this too long, and am sure I'm just not looking at it the right way. Hopefully someone with some experience with XML::Twig will have a suggestion.

I'm working on improving some of the datastructures returned by PerlMonks::Mechanized. One of the easiest ones to work with is just eluding me. Specifically, I'm updating the method that parses the Monastery's XML thread ticker returning a concise but useful datastructure.

The problem is that the datastructure returned by XML::Twig (and previously XML::Simple) isn't as simple as it should be; it contains extra levels of indirection that are unneeded.

It wouldn't be all that hard to simply traverse the structure returned by XML::Twig, modifying the datastructure to remove extra indirection, but somehow I believe XML::Twig is capable of giving me what I want in the first place. But over the past few days I've convinced myself that I don't understand XML::Twig enough to get the most out of this highly flexible module.

Here is a simplified snippet of code that gives an example of what I'm doing.

use strict; use warnings; use XML::Twig; use Data::Dumper; my $xml; { local $/ = ''; $xml = <DATA>; } print $xml, "\n"; my $twig = XML::Twig->new(); $twig->safe_parse( $xml ); my $struct = $twig->simplify( forcearray => 1, keyattr => [ qw/id/ ], # group_tags => { 'node id' => 'id' }, ); print Dumper $struct; __DATA__ <?xml version="1.0" encoding="ISO-8859-1"?> <thread id="504929"> <node id="504941"> </node> <node id="504942"> <node id="504950"> <node id="504964"> </node> </node> <node id="504953"> </node> </node> <node id="504944"> </node> </thread>

As you can see, I've commented out the group_tags attribute, because it wasn't gaining me anything, at least how I was using it.

The output I'm getting is:

$VAR1 = { 'node' => { '504942' => { 'node' => { '504953' => undef, '504950' => { 'node' => { '504964' => undef } } } }, '504941' => undef, '504944' => undef } };

I'm close, but what I really want is:

$VAR1 = { '504942' => { '504953' => undef, '504950' => { '504964' => undef } }, '504941' => undef, '504944' => undef };

In other words, the node => {... is extra indirection that I don't need or want.

Any tips on how to coax this out of XML::Twig?


Dave

Comment on How to simplify the datastructure returned by XML::Twig
Select or Download Code
Re: How to simplify the datastructure returned by XML::Twig
by Skeeve (Vicar) on Nov 06, 2005 at 09:09 UTC

    Why don't you create tag handlers for "thread" and "node" that build up your hierachy?

    Here something to get you started

    #!/usr/bin/perl use strict; use warnings; use XML::Twig; use Data::Dumper; my %result; my $thread= new XML::Twig( twig_handlers => { thread => \&thread, node => \&node, } ); $thread->safe_parse(join('',<DATA>)); print Dumper( \%result ); sub thread { $result{$_->{'att'}->{'id'}}= childrenhash($_->children( 'node' )) +; } sub node { $_->set_att( structure => { $_->{'att'}->{'id'}, childrenhash($_-> +children( 'node' ))} ); } sub childrenhash { my %result; foreach my $child (@_) { my $ch= $child->{'att'}->{'structure'}; @result{keys %$ch}= values %$ch; } return \%result; } __DATA__ <?xml version="1.0" encoding="ISO-8859-1"?> <thread id="504929"> <node id="504941"> </node> <node id="504942"> <node id="504950"> <node id="504964"> </node> </node> <node id="504953"> </node> </node> <node id="504944"> </node> </thread>


    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Re: How to simplify the datastructure returned by XML::Twig
by demerphq (Chancellor) on Nov 06, 2005 at 13:17 UTC

    You could patch the ticker to return the data in its flat record, parent pointer form instead of as a recursive data structure. Building the tree from the flat structure is a lot easier than reading the nested XML.

    And now ive patched the ticker to do just this. And return some additional information. Try using 'flattree=1' to get the new output. Also, the ticker now responds to xmlstyle settings and has the customary info tag with data about the ticker and stuff like that.

    You can build a fully connected tree structure from the "flattree" output quite easily. The algorithm is as follows:

    my $res = $o->pm_get( node_id => 180684, id => 505536, flattree => 1 ) +; my $data = XMLin( $res->content, ForceArray => 0 ); my %nodes; foreach my $node ( @{ $data->{node} } ) { my $nid = $node->{node_id}; # register this node so children can find it $nodes{ $nid } = $node; if ( my $pid = $node->{parent_node} ) { # tell the parent about this child $nodes{$pid}{kids}{$nid} = $node; # make the reference to the parent "hard" and not "soft" $node->{parent} = $nodes{$pid}; } }

    You are guaranteed that this will work. You dont need to worry about the parent not existing, or anything like that.

    ---
    $world=~s/war/peace/g

Re: How to simplify the datastructure returned by XML::Twig
by sock (Monk) on Nov 07, 2005 at 14:52 UTC
    Maybe an alternative to using XML::Twig would be using XML::SAX to build your own internal hash and pass it out. It could be a much simpler alternative than trying to load the entire then and parse it out with the DOM.
    Guns don't kill people, ninjas do.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://506086]
Approved by planetscape
Front-paged by bobf
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (9)
As of 2015-07-01 21:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (22 votes), past polls