in reply to
XML::Simple parsing help
Don't use XML::Simple!
perl -e "use Data::Dumper; use XML::Rules; print Dumper(XML::Rules::inferRulesFromExample( 'c:\temp\inventors.xml')) prints:
$VAR1 = {
'inventors' => 'no content',
'number' => 'as is',
'inventor' => 'as array no content',
'city,country,name,upper-name' => 'content'
};
With rules like this
XML::Rules would produce a data structure like this:
{
'inventors' => {
'inventor' => [
{
'country' => 'GB',
'city' => 'Aston Clinton',
'upper-name' => 'ANDY BARTH',
'number' => {
'_content' => '1',
'type' => 'integer'
},
'name' => 'Andy Barth'
},
{
'country' => 'GB',
'city' => 'Aylesbury',
'upper-name' => 'DANIELE DALL\'ACQUA'
+,
'number' => {
'_content' => '2',
'type' => 'integer'
},
'name' => 'Daniele Dall\'Acqua'
},
{
'country' => 'GB',
'city' => 'Calne',
'upper-name' => 'NIGEL DREW',
'number' => {
'_content' => '3',
'type' => 'integer'
},
'name' => 'Nigel Drew'
}
],
'type' => 'array'
}
};
Now I do not care about the
'type' => 'integer', I'd rather get just the content for the <number> as well, so let's change the rule for the tag to 'content'. This changes the structure to
{
'inventors' => {
'inventor' => [
{
'country' => 'GB',
'city' => 'Aston Clinton',
'upper-name' => 'ANDY BARTH',
'number' => '1',
'name' => 'Andy Barth'
},
{
'country' => 'GB',
'city' => 'Aylesbury',
'upper-name' => 'DANIELE DALL\'ACQUA'
+,
'number' => '2',
'name' => 'Daniele Dall\'Acqua'
},
{
'country' => 'GB',
'city' => 'Calne',
'upper-name' => 'NIGEL DREW',
'number' => '3',
'name' => 'Nigel Drew'
}
],
'type' => 'array'
}
};
Better, but I can do even better. If I know I want to get the inventors by number I can change the rule for the <inventor> tag to 'by number' and get a hash instead of an array:
{
'inventors' => {
'1' => {
'country' => 'GB',
'city' => 'Aston Clinton',
'upper-name' => 'ANDY BARTH',
'name' => 'Andy Barth'
},
'3' => {
'country' => 'GB',
'city' => 'Calne',
'upper-name' => 'NIGEL DREW',
'name' => 'Nigel Drew'
},
'type' => 'array',
'2' => {
'country' => 'GB',
'city' => 'Aylesbury',
'upper-name' => 'DANIELE DALL\'ACQUA',
'name' => 'Daniele Dall\'Acqua'
}
}
};
In which case getting the name of the inventor #1 would be just
$data->{inventors}{1}{name}. If the XML contains just the inventors I can get rid of the 'inventors' by changing it's rule to 'pass' and it'd be just
$data->{1}{name}. I also do not want the
'type' => 'array' so let's add " remove(type)" to the rule for <inventors>.
use Data::Dumper;
use XML::Rules;
my $parser = XML::Rules->new(
stripspaces => 7,
rules => {
'inventors' => 'no content remove(type)',
'inventor' => 'by number',
'number,city,country,name,upper-name' => 'content'
}
);
my $data = $parser->parsefile('c:\temp\inventors.xml');
#print Dumper($data);
print "The 1st inventor was $data->{inventors}{1}{name}\n";
Jenda
Enoch was right!
Enjoy the last years of Rome.