Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: XML::Simple parsing help

by Jenda (Abbot)
on Dec 09, 2012 at 13:41 UTC ( #1007979=note: print w/ replies, xml ) Need Help??


in reply to XML::Simple parsing help

Don't use XML::Simple!

perl -e "use Data::Dumper; use XML::Rules; print Dumper(XML::Rules::inferRulesFromExample( 'c:\temp\inventors.xml')) prints:

$VAR1 = { 'inventors' => 'no content', 'number' => 'as is', 'inventor' => 'as array no content', 'city,country,name,upper-name' => 'content' };
With rules like this XML::Rules would produce a data structure like this:
{ 'inventors' => { 'inventor' => [ { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'number' => { '_content' => '1', 'type' => 'integer' }, 'name' => 'Andy Barth' }, { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA' +, 'number' => { '_content' => '2', 'type' => 'integer' }, 'name' => 'Daniele Dall\'Acqua' }, { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'number' => { '_content' => '3', 'type' => 'integer' }, 'name' => 'Nigel Drew' } ], 'type' => 'array' } };
Now I do not care about the 'type' => 'integer', I'd rather get just the content for the <number> as well, so let's change the rule for the tag to 'content'. This changes the structure to
{ 'inventors' => { 'inventor' => [ { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'number' => '1', 'name' => 'Andy Barth' }, { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA' +, 'number' => '2', 'name' => 'Daniele Dall\'Acqua' }, { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'number' => '3', 'name' => 'Nigel Drew' } ], 'type' => 'array' } };
Better, but I can do even better. If I know I want to get the inventors by number I can change the rule for the <inventor> tag to 'by number' and get a hash instead of an array:
{ 'inventors' => { '1' => { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'name' => 'Andy Barth' }, '3' => { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'name' => 'Nigel Drew' }, 'type' => 'array', '2' => { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA', 'name' => 'Daniele Dall\'Acqua' } } };
In which case getting the name of the inventor #1 would be just $data->{inventors}{1}{name}. If the XML contains just the inventors I can get rid of the 'inventors' by changing it's rule to 'pass' and it'd be just $data->{1}{name}. I also do not want the 'type' => 'array' so let's add " remove(type)" to the rule for <inventors>.
use Data::Dumper; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, rules => { 'inventors' => 'no content remove(type)', 'inventor' => 'by number', 'number,city,country,name,upper-name' => 'content' } ); my $data = $parser->parsefile('c:\temp\inventors.xml'); #print Dumper($data); print "The 1st inventor was $data->{inventors}{1}{name}\n";

Jenda
Enoch was right!
Enjoy the last years of Rome.


Comment on Re: XML::Simple parsing help
Select or Download Code
Re^2: XML::Simple parsing help
by jhoop (Acolyte) on Dec 09, 2012 at 16:28 UTC

    This is brilliant. I had perused the docs for XML::Rules prior to settling on Simple, but I found more examples for the latter and so settled on that.. But I think this makes more sense. The actual XML in question is a much longer (and more convoluted) data structure, so this will allow me to focus on the content and not get caught up trying to manage all the extraneous tags etc.. Thank you.

Re^2: XML::Simple parsing help
by Anonymous Monk on Dec 10, 2012 at 03:14 UTC

    What, you forgot you included xml2XMLRules.pl?

    Its because it contains no documentation :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1007979]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2014-10-22 05:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (112 votes), past polls