http://www.perlmonks.org?node_id=913361

rthawkcom has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to build my own XML parser and have encountered an interesting problem. I need a way of programmatically accessing a hash in order to transfer the XML data structure into that hash.
Example XML: <one> <two> <three> value </three> </two> <one>
Parsing the XML is easy enough, but how to put what I have captured into a hash? I'm thinking something along the lines of:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $DATA; my @keys=qq(one two three);# Data we captured from parsing above XML my $newhash=&pfm(@keys,'value');# Make a new hash? $DATA={%$DATA,%$newhash};# Slap it back into the main data hash, repea +t as necessary? sub pfm{ } print Dumper $hash; $hash = { 'one' => { 'two' => { 'three' => 'value' } } };
Ideas? (other than just use XML::Simple !!) Not much info out there about dynamic building of hashes.

Replies are listed 'Best First'.
Re: Programmatic access of hashes
by jethro (Monsignor) on Jul 08, 2011 at 15:35 UTC
    my $DATA; my @keys=qw(one two three); # ^--bug in your script I assume # solution 1 eval "\$DATA->{" . join('}{',@keys) . "}= 'value';"; # solution 2 my $d= 'value'; foreach (reverse @keys) { my $x->{$_}=$d; $d= $x; } $DATA->{$keys[0]}= $d->{$keys[0]};

    Note that my solution 2 is somewhat senseless. If for example you parse some XML with 'three' and 'threex' below 'two', this simple solution won't work. But in the same sense your parser doesn't make sense to represent all the data as an array.

    Usually a parser will parse an xml recursively and the hash will be constructed while descending and ascending the tree. Using a similar method as I was using in solution 2. I would suggest using XML::Twig if XML::Simple is too simple for you.

Re: Programmatic access of hashes
by GrandFather (Saint) on Jul 08, 2011 at 23:19 UTC

    If you are doing this as a learning exercise then you should take a look at the code used in some of the XML modules in CPAN. Note that many of the XML:: modules sit on top of one of a small number of parser modules, so bearing that in mind you might start by looking at something like XML::Parser.

    The next next level up are modules like XML::TreeBuilder which maps fairly closely to what you have described so far. Beyond that are modules like XML::Twig which add ways of manipulating XML as it is being parsed thus reducing the need to store large parts of the file being processed in memory.

    A common theme with these modules is that they are object oriented and tend to represent interesting parts of the XML document as objects. XML documents tend to have a highly recursive structure and using OO is one good way of dealing with that.

    If, on the other hand you simply need a better way of dealing with XML than using XML::Simple (which is not), then you would save about ten years of your life by using XML::Twig or one of the other CPAN XML munging offerings.

    True laziness is hard work
Re: Programmatic access of hashes
by metaperl (Curate) on Jul 09, 2011 at 04:28 UTC
    A hash is not an adequate structure for representing XML. XML can be order sensitive and Perl hashes are not. Also, the author of XML::Simple admits in his FAQ that even he doesnt process XML using his module --- it was only intended for XML as config files. It cant handle mixed content for instance.

    On the other hand HTML::Element has a new_from_lol method that takes nested arrayrefs to form any type of XML possible.

    Along those lines, I've developed code that compiles XML structure to the nested arrayref structure and uses Data::Diver to fill in the XML content with the hash values. Here a small sample of a file representing XML:

    sub lol { my ($self) = @_; my $root = $self->data; [ QBXML => DIVE( $root, qw() ), [ QBXMLMsgsRq => { 'onError' => 'stopOnError' } => DIVE( $root, qw() ), [ CustomerAddRq => DIVE( $root, qw() ), [ CustomerAdd => DIVE( $root, qw() ), [ Name => DIVE( $root, qw(Name) ) ], [ IsActive => DIVE( $root, qw(IsActive) ) ], [ ParentRef => DIVE( $root, qw(ParentRef) ), [ ListID => DIVE( $root, qw(ParentRef ListID) +) ], [ FullName => DIVE( $root, qw(ParentRef FullNa +me) ) ] ], [ CompanyName => DIVE( $root, qw(CompanyName) ) ], [ Salutation => DIVE( $root, qw(Salutation) ) ], [ FirstName => DIVE( $root, qw(FirstName) ) ], [ MiddleName => DIVE( $root, qw(MiddleName) ) ], [ LastName => DIVE( $root, qw(LastName) ) ], [ BillAddress => DIVE( $root, qw(BillAddress) ), [ Addr1 => DIVE( $root, qw(BillAddress Addr1) +) ], [ Addr2 => DIVE( $root, qw(BillAddress Addr2) +) ], [ Addr3 => DIVE( $root, qw(BillAddress Addr3) +) ], [ Addr4 => DIVE( $root, qw(BillAddress Addr4) +) ], [ Addr5 => DIVE( $root, qw(BillAddress Addr5) +) ], [ City => DIVE( $root, qw(BillAddress City) ) +], [ State => DIVE( $root, qw(BillAddress State) +) ], [ PostalCode => DIVE( $root, qw(BillAddress PostalCode) +) ], [ Country => DIVE( $root, qw(BillAddress Count +ry) ) ], [ Note => DIVE( $root, qw(BillAddress Note) ) +] ], [ ShipAddress => DIVE( $root, qw(ShipAddress) ), [ Addr1 => DIVE( $root, qw(ShipAddress Addr1) +) ], [ Addr2 => DIVE( $root, qw(ShipAddress Addr2) +) ], [ Addr3 => DIVE( $root, qw(ShipAddress Addr3) +) ], [ Addr4 => DIVE( $root, qw(ShipAddress Addr4) +) ], [ Addr5 => DIVE( $root, qw(ShipAddress Addr5) +) ], [ City => DIVE( $root, qw(ShipAddress City) ) +], [ State => DIVE( $root, qw(ShipAddress State) +) ], [ PostalCode => DIVE( $root, qw(ShipAddress PostalCode) +) ], [ Country => DIVE( $root, qw(ShipAddress Count +ry) ) ], [ Note => DIVE( $root, qw(ShipAddress Note) ) +] ], [ Phone => DIVE( $root, qw(Phone) ) ], [ AltPhone => DIVE( $root, qw(AltPhone) ) ], [ Fax => DIVE( $root, qw(Fax) ) ], [ Email => DIVE( $root, qw(Email) ) ], [ Contact => DIVE( $root, qw(Contact) ) ], [ AltContact => DIVE( $root, qw(AltContact) ) ], ...




    The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development.

    -- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"

Re: Programmatic access of hashes
by rthawkcom (Novice) on May 04, 2012 at 20:02 UTC
    Thanks for everyone's input. I must admit I am a little bit dismayed by the negativity. Regardless, I continued development and made my own parser. This has been well tested and works great! Posting the code here so it may be of use to someone else:
    #!/usr/bin/perl package XMLcrude; use strict; use warnings; sub new{ my $this = shift; my $class = ref($this) || $this; my $self={@_}; bless $self, $class; return $self; } # If you do not have access to XML::Simple, and are not allowed to ins +tall it, this can replace it in most situations. ############################# # XML IN - suck in the XML file # sub XMLin{ my $self=shift; $self->{file}=shift||return; #load file into @file open my $FILE, '<', $self->{file} or die "Unable to open log file [$se +lf->{file}], system said $!"; if(<$FILE> !~ /<?xml/){die 'Not an XML file! Header "<?xml" is missin +g!'} my @file=<$FILE>; close $FILE; my $data=join '',@file; $data=~s/<!--.*?-->//gs; # Strip out comments. my ($root,$value,@key); foreach my $tag(split(/(<.*?>)/,$data)){ #found a tag, so create a hash element for it. if($tag=~/<((?:[^>'"]*|(['"]).*?\1)*)>/){ #start tag - add element depth if($tag!~/^<\//){ $tag=~s/<|>//g; # Remove brackets push @key,$tag; # <START> Starting + tag, so remember this tag name. Used to make key for hash element. #end tag - remove element depth }else{ if(defined $value){ #record data my $key=\$self; # Get reference to our +self. (A reference to the reference to the hash that stores our data) $key=\($$key->{$_}) for @key; # Build up hash eleme +nt key using the tags we gathered in @key $value=~s/\n|\t//g; # Clean up the valu +e. if(length $value){$$key= $value;} # Assign value, i +ncluding zeros, to hash element reference if there is data $value=''; # Clear value for ne +xt round. } $root=pop @key; # </END> Closing ta +g, so lower the element depth... also, remember the very last value r +emoved. } next; # We are done process +ing tags. } $value.=$tag; # Record value } return $self->$root}; # XML document should + start with a "root" container element that holds everything. } 1;
Re: Programmatic access of hashes
by Anonymous Monk on Jul 09, 2011 at 03:12 UTC