http://www.perlmonks.org?node_id=1007979


in reply to XML::Simple parsing help

Don't use XML::Simple!

perl -e "use Data::Dumper; use XML::Rules; print Dumper(XML::Rules::inferRulesFromExample( 'c:\temp\inventors.xml')) prints:

$VAR1 = { 'inventors' => 'no content', 'number' => 'as is', 'inventor' => 'as array no content', 'city,country,name,upper-name' => 'content' };
With rules like this XML::Rules would produce a data structure like this:
{ 'inventors' => { 'inventor' => [ { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'number' => { '_content' => '1', 'type' => 'integer' }, 'name' => 'Andy Barth' }, { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA' +, 'number' => { '_content' => '2', 'type' => 'integer' }, 'name' => 'Daniele Dall\'Acqua' }, { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'number' => { '_content' => '3', 'type' => 'integer' }, 'name' => 'Nigel Drew' } ], 'type' => 'array' } };
Now I do not care about the 'type' => 'integer', I'd rather get just the content for the <number> as well, so let's change the rule for the tag to 'content'. This changes the structure to
{ 'inventors' => { 'inventor' => [ { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'number' => '1', 'name' => 'Andy Barth' }, { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA' +, 'number' => '2', 'name' => 'Daniele Dall\'Acqua' }, { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'number' => '3', 'name' => 'Nigel Drew' } ], 'type' => 'array' } };
Better, but I can do even better. If I know I want to get the inventors by number I can change the rule for the <inventor> tag to 'by number' and get a hash instead of an array:
{ 'inventors' => { '1' => { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'name' => 'Andy Barth' }, '3' => { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'name' => 'Nigel Drew' }, 'type' => 'array', '2' => { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA', 'name' => 'Daniele Dall\'Acqua' } } };
In which case getting the name of the inventor #1 would be just $data->{inventors}{1}{name}. If the XML contains just the inventors I can get rid of the 'inventors' by changing it's rule to 'pass' and it'd be just $data->{1}{name}. I also do not want the 'type' => 'array' so let's add " remove(type)" to the rule for <inventors>.
use Data::Dumper; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, rules => { 'inventors' => 'no content remove(type)', 'inventor' => 'by number', 'number,city,country,name,upper-name' => 'content' } ); my $data = $parser->parsefile('c:\temp\inventors.xml'); #print Dumper($data); print "The 1st inventor was $data->{inventors}{1}{name}\n";

Jenda
Enoch was right!
Enjoy the last years of Rome.

Replies are listed 'Best First'.
Re^2: XML::Simple parsing help
by jhoop (Acolyte) on Dec 09, 2012 at 16:28 UTC

    This is brilliant. I had perused the docs for XML::Rules prior to settling on Simple, but I found more examples for the latter and so settled on that.. But I think this makes more sense. The actual XML in question is a much longer (and more convoluted) data structure, so this will allow me to focus on the content and not get caught up trying to manage all the extraneous tags etc.. Thank you.

Re^2: XML::Simple parsing help
by Anonymous Monk on Dec 10, 2012 at 03:14 UTC

    What, you forgot you included xml2XMLRules.pl?

    Its because it contains no documentation :)