http://www.perlmonks.org?node_id=984723

Smaug has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I have been battling with this for two days and I now have gone in so may circles I don't know what's left to try.

I have the following XML file (This is just a small section) which is output from another system:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Import format="3.0" fileId="54630000-3141-1404-4310-7F0000014030" fil +eCreationDateTime="2012-07-23T06:04:31.415" catalogSource="TLCMz"> <Catalog> <Vendors> <Vendor name="Allen Systems" vendorUniqueKey="ALLENGRP"/> <Vendor name="Aonix North America" vendorUniqueKey="AONIX"/> <Vendor name="Beta Systems Software" vendorUniqueKey="BETASY +S"/> <Vendor name="BMC Software" vendorUniqueKey="BMC"/> </Vendors> </Catalog> </Import>

I have used XML::Simple and have reach the point where just to try get anything, I have reduced the code to:
use strict; use warnings; my $xmldoc = XMLin("C:\\dsys.xml"); print Dumper($xmldoc);

At this point all is well, but what I'd like is the values in the <Vendor> tag in a hash with the vendorUniqueKey as the key and the name as the value.
I've tried:
my $vendors = "$xmldoc->{'Catalog'}->{'Vendors'}";
Which returns a reference to the hash (I think), but I can' t seem to find a way to step through the data in the hash. When I try:
my $vendors = "$xmldoc->{'Catalog'}->{'Vendors'}"; foreach my $vendor (@{$vendors->{Vendor}}) { print $vendor->{name} . "\n"; }

I get Can't use string ("HASH(0x31d7078)") as a HASH ref while "strict refs" in use and I get similar errors if I try a host of other ways. Clearly I'm lost and don't know where to go from here. I don't even know if the data I need is in the hash or if I've completely messed it up.
Any help would be appreciated.
Regards,
Smaug.
Peddle faster monkeys!! I need more power!!

Replies are listed 'Best First'.
Re: Help with attributes and XML::Simple
by Anonymous Monk on Aug 01, 2012 at 08:25 UTC

    Which returns a reference to the hash (I think),

    In perl, "quotes create strings" , so you have a string not a reference -- ditch the quotes -- and you're off to review perlintro :)

      Thanks so for your help! Unfortunately being able to read the documentation is certainly not a guarantee to understanding it.
      Regards,
      Smaug.
      Peddle faster monkeys!! I need more power!!
Re: Help with attributes and XML::Simple
by ig (Vicar) on Aug 01, 2012 at 08:42 UTC

    In my experience, XML::Simple makes dealing with very simple XML simple but it makes dealing with more complex XML quite difficult. Had you spent two days learning one of the alternatives (I use XML::LibXML almost exclusively these days, but there are other very capable modules that give you good control over both parsing and generation of XML files) you could probably have solved your problem and learned a more powerful tool for future work.

    In summary, my suggestion is to use XML::Simple if it does what you need by default or you can tweak it in a few minutes. Otherwise, invest in learning a more general/powerful tool.

    Update:

    As an example of one way to get your hash with XML::LibXML:

    use strict; use warnings; use XML::LibXML; use Data::Dumper; my $xml = do { local $/; <DATA> }; my $dom = XML::LibXML->load_xml(string => $xml); my %vendors = map { $_->getAttribute('vendorUniqueKey') => $_->getAttribute('name') } @{$dom->getElementsByTagName('Vendor')}; print Dumper(\%vendors); __DATA__ <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Import format="3.0" fileId="54630000-3141-1404-4310-7F0000014030" fil +eCreationDateTime="2012-07-23T06:04:31.415" catalogSource="TLCMz"> <Catalog> <Vendors> <Vendor name="Allen Systems" vendorUniqueKey="ALLENGRP"/> <Vendor name="Aonix North America" vendorUniqueKey="AONIX"/> <Vendor name="Beta Systems Software" vendorUniqueKey="BETASY +S"/> <Vendor name="BMC Software" vendorUniqueKey="BMC"/> </Vendors> </Catalog> </Import>

    gives

    $VAR1 = { 'BETASYS' => 'Beta Systems Software', 'BMC' => 'BMC Software', 'ALLENGRP' => 'Allen Systems', 'AONIX' => 'Aonix North America' };

    That's simple enough. The problem is, reading all the XML::LibXML docs (there are many) to find out how to do this isn't so simple.

      If you've got a reasonably recent version of XML::LibXML, that can be made even simpler.

      use XML::LibXML 1.94; use Data::Dumper; my %hash = XML::LibXML -> load_xml(location => 'mydata.xml') -> getElementsByTagName('Vendor') -> map(sub { $_->{vendorUniqueKey} => $_->{name} }); print Dumper \%hash;
      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Help with attributes and XML::Simple
by Athanasius (Archbishop) on Aug 01, 2012 at 09:08 UTC

    Building on the correction by Anonymous Monk above, the following code gives you a hash in which each entry has a vendorUniqueKey as the key and the corresponding name as the value:

    #! perl use strict; use warnings; use Data::Dumper; use XML::Simple; my $xmldoc = XMLin("dsys.xml"); my $vendors = $xmldoc->{'Catalog'}->{'Vendors'}; my %vendors = %{ $vendors->{'Vendor'} }; my %unique_vendors; while (my ($outer_key, $outer_value) = each %vendors) { if (ref($outer_value) eq 'HASH') { my ($inner_key, $inner_value) = each %$outer_value; if ($inner_key eq 'vendorUniqueKey') { $unique_vendors{$inner_value} = $outer_key; } else { warn "Inner key is not 'vendorUniqueKey'"; } } else { warn "Outer value is not a hash reference"; } } print Dumper(\%unique_vendors);

    Output:

    $VAR1 = { 'BETASYS' => 'Beta Systems Software', 'BMC' => 'BMC Software', 'ALLENGRP' => 'Allen Systems', 'AONIX' => 'Aonix North America' };

    HTH,

    Athanasius <°(((><contra mundum

      If you want to use XML::Simple, then this is also one possible solution

      my %resulting_vendors; my %vendors = %{ $xmldoc->{Catalog}{Vendors}{Vendor} }; foreach my $vendor2 (keys %vendors ) { print "vendorUniqueKey: [", $vendors{$vendor2}{vendorUniqueKey}, " +], vendor2: [$vendor2]\n"; $resulting_vendors{$vendors{$vendor2}{vendorUniqueKey}}=$vendor2; } print Data::Dumper->Dump( [\%resulting_vendors], [qw(resulting_vendors +)] );

      The advantage of this code is, that it only uses things that you also used

      The line

      my %vendors = %{ $xmldoc->{Catalog}{Vendors}{Vendor} };

      isn't really necessary, you can freely use the right part instead of %vendors, if you really want.

Re: Help with attributes and XML::Simple
by Jenda (Abbot) on Aug 01, 2012 at 16:11 UTC

    See Simpler than XML::Simple for some reasons why you should not use XML::Simple and what to use instead.

    As we do not know what do you need the data for, I can only guess in what format would you like them so just a tiny example. This code specifies that form each &lve;Vendor> you want the vendorUniqueKey as the key and the name as the value and then lets you execute some code to process this data once the </Vendors> tag is parsed. This style lets you process huge files without keeping everything in memory.

    use strict; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, rules => { Vendor => sub {return $_[1]->{vendorUniqueKey} => $_[1]->{name +}}, Vendors => sub { my ($tag, $vendors) = @_; print "We have " . scalar(keys %$vendors) . " vendors\n"; foreach (keys %$vendors) { print " $_ => $vendors->{$_}\n"; } return; }, ':default:' => '' } ); $parser->parse(\*DATA); __DATA__ <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Import format="3.0" fileId="54630000-3141-1404-4310-7F0000014030" fil +eCreationDateTime="2+012-07-23T06:04:31.415" catalogSource="TLCMz"> <Catalog> <Vendors> <Vendor name="Allen Systems" vendorUniqueKey="ALLENGRP"/> <Vendor name="Aonix North America" vendorUniqueKey="AONIX"/> <Vendor name="Beta Systems Software" vendorUniqueKey="BETASY +S"/> <Vendor name="BMC Software" vendorUniqueKey="BMC"/> </Vendors> </Catalog> </Import>

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.