http://www.perlmonks.org?node_id=1007955

jhoop has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have some XML:

... <inventors type="array"> <inventor> <city>Aston Clinton</city> <name>Andy Barth</name> <number type="integer">1</number> <country>GB</country> <upper-name>ANDY BARTH</upper-name> </inventor> <inventor> <city>Aylesbury</city> <name>Daniele Dall'Acqua</name> <number type="integer">2</number> <country>GB</country> <upper-name>DANIELE DALL'ACQUA</upper-name> </inventor> <inventor> <city>Calne</city> <name>Nigel Drew</name> <number type="integer">3</number> <country>GB</country> <upper-name>NIGEL DREW</upper-name> </inventor> </inventors> ...

And I need to access the name of the inventor with Number=1. Using XML::Simple as follows

my $xml_to_hash = XMLin($xml_file, #ForceArray => 0, #KeyAttr => {}, );

I get this output:

$VAR1 = { 'inventors' => { 'inventor' => { 'Nigel Drew' => { 'country' => 'GB', 'city' => 'Calne', 'number' => { 'content' => '3', 'type' => 'integer' }, 'upper-name' => 'NIGEL DREW' }, 'Daniele Dall\'Acqua' => { 'country' => 'GB', 'city' => 'Aylesbury', 'number' => { 'content' => '2', 'type' => 'integer' }, 'upper-name' => 'DANIELE DALL\'ACQUA' }, 'Andy Barth' => { 'country' => 'GB', 'city' => 'Aston Clinton', 'number' => { 'content' => '1', 'type' => 'integer' }, 'upper-name' => 'ANDY BARTH' } }, 'type' => 'array' },

And with ForceArray enabled:

$VAR1 = { 'inventors' => [ { 'inventor' => [ { 'country' => ['GB'], 'city' => ['Aston Clinton'], 'upper-name' => ['ANDY BARTH'], 'number' => [{'content' => '1', 'type' => 'integer'}], 'name' => ['Andy Barth'] }, { 'country' => ['GB'], 'city' => ['Aylesbury'], 'upper-name' => ['DANIELE DALL\'ACQUA'], 'number' => [{'content' => '2', 'type' => 'integer'}], 'name' => ['Daniele Dall\'Acqua'] }, { 'country' => ['GB'], 'city' => ['Calne'], 'upper-name' => ['NIGEL DREW'], 'number' => [{'content' => '3', 'type' => 'integer'}], 'name' => ['Nigel Drew'] } ], 'type' => 'array' } ], ...

In attempting to access Inventor #1 (with ForceArray disabled) I tried the following:

foreach my $inventors(%{$xml_to_hash->{inventors}}){ if ($inventors->{inventor}->{number}->{content} == 1){ print $inventors->{inventor}; } }

to which I get error: "Can't use string ("inventor") as a HASH ref.." I also tried:

foreach my $inventor(%{$xml_to_hash->{inventors}->{inventor}}){ if ($inventor->{number}->{content} == 1){ print $inventor; } }

Which gives "Can't use string ("Nigel Drew") as a HASH ref.."

I'm not sure the output with ForceArray enabled is simplifying anything, and not sure how to begin accessing that structure.. Maybe I should be trying to use KeyAttr to organize the inventors under the <number> field, but my attempts have proved fruitless.. I've been using the manual and some info here: http://interoperating.info/courses/perl4data/node/26 .. Any help much appreciated. Thanks for your time.

Replies are listed 'Best First'.
Re: XML::Simple parsing help
by Kenosis (Priest) on Dec 09, 2012 at 07:54 UTC

    Perhaps the following will assist you:

    use strict; use warnings; use XML::Simple; my $xml = <<'END'; <inventors type="array"> <inventor> <city>Aston Clinton</city> <name>Andy Barth</name> <number type="integer">1</number> <country>GB</country> <upper-name>ANDY BARTH</upper-name> </inventor> <inventor> <city>Aylesbury</city> <name>Daniele Dall'Acqua</name> <number type="integer">2</number> <country>GB</country> <upper-name>DANIELE DALL'ACQUA</upper-name> </inventor> <inventor> <city>Calne</city> <name>Nigel Drew</name> <number type="integer">3</number> <country>GB</country> <upper-name>NIGEL DREW</upper-name> </inventor> </inventors> END my $xml_to_hash = XMLin( $xml, ForceArray => 1, KeyAttr => {}, ); for my $inventor ( @{ $xml_to_hash->{inventor} } ) { if ( $inventor->{number}[0]{content} == 1 ) { while ( my ( $key, $value ) = each %{$inventor} ) { print "$key => @$value[0]\n" if $key ne 'number'; } } }

    Output:

    country => GB city => Aston Clinton upper-name => ANDY BARTH name => Andy Barth

    If you review the Dumper output of $xml_to_hash, you'll see the dereferencing used above to get the data. Remember:

    [ ] = array { } = hash

      Alternatively, using XML::TreeBuilder:

      use strict; use warnings; use XML::TreeBuilder; my $xml = <<'END'; <inventors type="array"> <inventor> <city>Aston Clinton</city> <name>Andy Barth</name> <number type="integer">1</number> <country>GB</country> <upper-name>ANDY BARTH</upper-name> </inventor> <inventor> <city>Aylesbury</city> <name>Daniele Dall'Acqua</name> <number type="integer">2</number> <country>GB</country> <upper-name>DANIELE DALL'ACQUA</upper-name> </inventor> <inventor> <city>Calne</city> <name>Nigel Drew</name> <number type="integer">3</number> <country>GB</country> <upper-name>NIGEL DREW</upper-name> </inventor> </inventors> END my $tree=XML::TreeBuilder->new(); $tree->parse($xml); my @inventornums=$tree->find_by_tag_name('number'); foreach my $inventornum(@inventornums){ if ($inventornum->as_text()==2){ my $parent=$inventornum->parent(); print $parent->as_XML; } }

        This is a very nice and simple solution.. I'm going to play with this and XML::Rules as suggested by Jenda, see which will work best with my application.. Thanks very much!

      Thank you. Dereferencing this made me dizzy.

        You're most welcome!

        Your question has sparked many great solutions. Here's one option that uses Mojo::DOM to parse the XML.

        The parsing's in a subroutine, so you just send a dom object and the number, and a hash is returned with the requested data--if found:

        use Modern::Perl; use Mojo::DOM; use Data::Dumper; my $xml = <<'END'; <inventors type="array"> <inventor> <city>Aston Clinton</city> <name>Andy Barth</name> <number type="integer">1</number> <country>GB</country> <upper-name>ANDY BARTH</upper-name> </inventor> <inventor> <city>Aylesbury</city> <name>Daniele Dall'Acqua</name> <number type="integer">2</number> <country>GB</country> <upper-name>DANIELE DALL'ACQUA</upper-name> </inventor> <inventor> <city>Calne</city> <name>Nigel Drew</name> <number type="integer">3</number> <country>GB</country> <upper-name>NIGEL DREW</upper-name> </inventor> </inventors> END my $dom = Mojo::DOM->new($xml); my %record = getInventorNum($dom, 1); print Dumper \%record; sub getInventorNum { my ( $dom, $num ) = @_; my %hash; $dom->find('number')->each( sub { if ( $_->text == $num ) { for my $element ( @{ $_->parent->children } ) { next if $element->type eq 'number'; $hash{ $element->type } = $element->text; } } } ); return %hash; }

        Dumper output of %record:

        $VAR1 = { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'name' => 'Andy Barth' };
Re: XML::Simple parsing help
by Jenda (Abbot) on Dec 09, 2012 at 13:41 UTC

    Don't use XML::Simple!

    perl -e "use Data::Dumper; use XML::Rules; print Dumper(XML::Rules::inferRulesFromExample( 'c:\temp\inventors.xml')) prints:

    $VAR1 = { 'inventors' => 'no content', 'number' => 'as is', 'inventor' => 'as array no content', 'city,country,name,upper-name' => 'content' };
    With rules like this XML::Rules would produce a data structure like this:
    { 'inventors' => { 'inventor' => [ { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'number' => { '_content' => '1', 'type' => 'integer' }, 'name' => 'Andy Barth' }, { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA' +, 'number' => { '_content' => '2', 'type' => 'integer' }, 'name' => 'Daniele Dall\'Acqua' }, { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'number' => { '_content' => '3', 'type' => 'integer' }, 'name' => 'Nigel Drew' } ], 'type' => 'array' } };
    Now I do not care about the 'type' => 'integer', I'd rather get just the content for the <number> as well, so let's change the rule for the tag to 'content'. This changes the structure to
    { 'inventors' => { 'inventor' => [ { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'number' => '1', 'name' => 'Andy Barth' }, { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA' +, 'number' => '2', 'name' => 'Daniele Dall\'Acqua' }, { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'number' => '3', 'name' => 'Nigel Drew' } ], 'type' => 'array' } };
    Better, but I can do even better. If I know I want to get the inventors by number I can change the rule for the <inventor> tag to 'by number' and get a hash instead of an array:
    { 'inventors' => { '1' => { 'country' => 'GB', 'city' => 'Aston Clinton', 'upper-name' => 'ANDY BARTH', 'name' => 'Andy Barth' }, '3' => { 'country' => 'GB', 'city' => 'Calne', 'upper-name' => 'NIGEL DREW', 'name' => 'Nigel Drew' }, 'type' => 'array', '2' => { 'country' => 'GB', 'city' => 'Aylesbury', 'upper-name' => 'DANIELE DALL\'ACQUA', 'name' => 'Daniele Dall\'Acqua' } } };
    In which case getting the name of the inventor #1 would be just $data->{inventors}{1}{name}. If the XML contains just the inventors I can get rid of the 'inventors' by changing it's rule to 'pass' and it'd be just $data->{1}{name}. I also do not want the 'type' => 'array' so let's add " remove(type)" to the rule for <inventors>.
    use Data::Dumper; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, rules => { 'inventors' => 'no content remove(type)', 'inventor' => 'by number', 'number,city,country,name,upper-name' => 'content' } ); my $data = $parser->parsefile('c:\temp\inventors.xml'); #print Dumper($data); print "The 1st inventor was $data->{inventors}{1}{name}\n";

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      This is brilliant. I had perused the docs for XML::Rules prior to settling on Simple, but I found more examples for the latter and so settled on that.. But I think this makes more sense. The actual XML in question is a much longer (and more convoluted) data structure, so this will allow me to focus on the content and not get caught up trying to manage all the extraneous tags etc.. Thank you.

      What, you forgot you included xml2XMLRules.pl?

      Its because it contains no documentation :)

Re: XML::Simple parsing help
by karlgoethebier (Abbot) on Dec 09, 2012 at 15:02 UTC

    Or give XML::Twig a try:

    #!/usr/bin/perl + use strict; use warnings; use XML::Twig; my $file = shift; my $twig = XML::Twig->new( twig_handlers => { inventor => \&inventor } + ); $twig->parsefile($file); sub inventor { my ( $twig, $inventor ) = @_; if ( $inventor->first_child('number')->text eq "1" ) { print $inventor->first_child('country')->text . qq(\n) . $inventor->first_child('city')->text . qq(\n) . $inventor->first_child('upper-name')->text . qq(\n) . $inventor->first_child('name')->text . qq(\n); } } __END__
    See also xmltwig.org

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»