Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

XML to Hash Truncating Keys Problem

by Wayne (Novice)
on Mar 28, 2019 at 14:21 UTC ( #1231805=perlquestion: print w/replies, xml ) Need Help??

Wayne has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am attempting to parse valid XML data as a string and output the result as a complex data structure (a hash of hashes and arrays) which I can then use to reference individual data elements.

I have written a small function to perform this task, which almost works. It runs without error and it does return a hash that I can reference.

The problem is that a few of the hash keys have been truncated. That is, they are missing the last character. The majority of keys, however appear fine.

I am using valid XML and the problem appears to be intermittent, although the fields affected are always the same (company_name, fee_lines, address_2, etc.)

Here is the code i'm using:

#!/usr/bin/perl -w use strict; use 5.24.1; use Data::Dumper qw(Dumper); use XML::Mini::Document; $Data::Dumper::Terse=1; my($xml)='<insert valid xml string here>'; my($hash_ref)=&convert_xml_to_hash($xml); my $data=eval($hash_ref); say Dumper($data); exit; sub convert_xml_to_hash { my($xml)=(@_); my($hash)={}; if($xml) { my($xml_object)=XML::Mini::Document->new(); $xml_object->parse($xml); my($output)=$xml_object->toHash(); $hash=Dumper($output); } return($hash); }

Here is a sample of the output:

... 'billing_address' => { 'province' => 'New Brunswick', 'city' => 'Moncton', 'company_nam' => '', # Should be company_name 'phone' => '5065551212', 'address_' => '', # Should be address_2 'country_code' => 'CA', 'first_name' => 'Joe', 'address_1' => '123 Somestreet', 'email' => 'someone@somewhere.com', 'country' => 'Canada', 'postal_code' => 'E4E 4E4', 'last_name' => 'Blow', 'province_code' => 'NB' }, 'fee_line' => '', # should be fee_lines 'shipping_tax' => '0.00', ...

At first I thought that they were just some careless typos, but after some investigation I realized that was not the case.

Any insight into this problem or a nudge in the right direction would be greatly appreciated.

Thanks!

Replies are listed 'Best First'.
Re: XML to Hash Truncating Keys Problem
by roboticus (Chancellor) on Mar 28, 2019 at 14:45 UTC

    Wayne:

    It would've been easier had you actually included a bit of XML. I went ahead and installed XML::Mini::Document and ran it, but as you can see, I'm not experiencing the problem:

    $ cat pm_1231805_to_hash.pl #!/usr/bin/perl -w use strict; use warnings; use 5.24.1; use Data::Dumper qw(Dumper); use XML::Mini::Document; $Data::Dumper::Terse=1; my $xml = <<EOXML; <shipping_tax>0.00</shipping_tax> <fee_lines></fee_lines> <billing_address> <city>Moncton</city> <province_code>NB</province_code> <country>Canada</country> <address_2></address_2> <email>someone\@somewhere.com</email> <country_code>CA</country_code> <province>New Brunswick</province> <phone>5065551212</phone> <first_name>Joe</first_name> <address_1>123 Somestreet</address_1> <postal_code>E4E 4E4</postal_code> <company_name></company_name> <last_name>Blow</last_name> </billing_address> EOXML my($hash_ref)=&convert_xml_to_hash($xml); my $data=eval($hash_ref); say Dumper($data); exit; sub convert_xml_to_hash { my($xml)=(@_); my($hash)={}; if($xml) { my($xml_object)=XML::Mini::Document->new(); $xml_object->parse($xml); my($output)=$xml_object->toHash(); $hash=Dumper($output); } return($hash); } Roboticus@Waubli ~ $ perl pm_1231805_to_hash.pl { 'shipping_tax' => '0.00', 'fee_lines' => '', 'billing_address' => { 'province' => 'New Brunswick', 'country' => 'Canada', 'address_2' => '', 'last_name' => 'Blow', 'company_name' => '', 'province_code' => 'NB', 'phone' => '5065551212', 'country_code' => 'CA', 'email' => 'someone@somewhere.com', 'postal_code' => 'E4E 4E4', 'city' => 'Moncton', 'address_1' => '123 Somestreet', 'first_name' => 'Joe' } }

    Perhaps you have either (a) some wonky data, or (b) a bit of code you've not shown us is frobnicating it somewhere.

    Edit: I forgot to mention that I didn't build the input by hand. Since you at least provided some output, I edited the output to make it what you wanted. I then used XML::Mini::Document to rebuild the input (code in the readmore tags below).

Re: XML to Hash Truncating Keys Problem
by Lotus1 (Vicar) on Mar 28, 2019 at 16:13 UTC

    It worked for me after I removed the '%' character from my test xml. With that character included in a text node as shown below the parse function would hang. Perhaps you have some other special character that the XML::Mini::Document module can't handle.

    use warnings; use strict; #use 5.24.1; use Data::Dumper qw(Dumper); use XML::Mini::Document; $Data::Dumper::Terse=1; my $xml_input; { local $/; $xml_input = <DATA>; } print "1********************************************\n"; print $xml_input; my $xml_object =XML::Mini::Document->new(); $xml_object->parse($xml_input); #goto END; my $hash_ref = $xml_object->toHash(); print "2********************************************\n"; print Dumper($hash_ref); my $dumped_hash=Dumper($hash_ref); my $evalled_hashdata=eval($dumped_hash); print "3********************************************\n"; print Dumper($evalled_hashdata); END: #exit; ## This will cause ->parse() to hang. ## <BitmapPath>%TEST_ROOT%\bitmaps\</BitmapPath> __DATA__ <?xml version="1.0" encoding="utf-8"?> <NetConfig> <UseServerTimer>false</UseServerTimer> <PhoneNumberRegex /> <InitializeStatistics_Logging>false</InitializeStatistics_Logging> <InitializeStatistics_Logging>false_test</InitializeStatistics_Loggi +ng> <InitializeStatistics_Logging>false test</InitializeStatistics_Loggi +ng> <InitializeStatistics_Logging>false test </InitializeStatistics_Logg +ing> <ViewerConfigs> <TileNamePatern>{0}\{1}\{2}.tile</TileNamePatern> <ViewerConfigWrap> <File>viewerConfigx.xml</File> <Name>FI_RT</Name> </ViewerConfigWrap> </ViewerConfigs> </NetConfig>

    Here is the output:

Re: XML to Hash Truncating Keys Problem
by tangent (Vicar) on Mar 29, 2019 at 00:21 UTC
    I'm sure there are many different ways to parse your XML, but, assuming you still want to avoid XML::Simple, here is a way to do it using XML::LibXML.

    It requires that you know a good bit about your incoming data structure but I think the results will be quite useful for further processing.

    use Data::Dumper; use XML::LibXML; my $xml = q|<?xml version="1.0"?> <root><order><id>359</id><order_number>359</order_number><created_at>2 +019-03-28 10:33:06</created_at>...etc...|; my @top_level = qw( order_number created_at total total_shipping total_discount customer_id currency ); my @nested = qw( shipping_lines billing_address shipping_address fee_lines ); my @orders; my $doc = XML::LibXML->load_xml(string => $xml); my @nodes = $doc->findnodes('//order'); for my $node (@nodes) { my %order; # get the top level nodes for my $name ( @top_level ) { $order{$name} = $node->findvalue($name); } # get the nested nodes for my $name ( @nested ) { my ($elem) = $node->findnodes($name); $order{$name} = process_nested($elem); } # special case for line_items node my ($line_items) = $node->findnodes('line_items'); $order{'line_items'} = process_line_items( $line_items ); push( @orders, \%order ); } print Dumper(\@orders); sub process_nested { my ($node) = @_; my %item; my $elem = $node->firstChild or return ''; $item{ $elem->nodeName } = $elem->textContent; while ( my $next = $elem->nextSibling ) { $item{ $next->nodeName } = $next->textContent; $elem = $next; } return \%item; } sub process_line_items { my ($node) = @_; my @items; my @id_nodes = $node->findnodes('id') or return ''; for my $id_node ( @id_nodes ) { my $elem = $id_node; my %item = ( id => $elem->textContent ); while ( my $next = $elem->nextSibling ) { my $name = $next->nodeName; # exit if we have reached the next item last if $name eq 'id'; $item{$name} = $next->textContent; $elem = $next; } push(@items, \%item); } return \@items; }
    OUTPUT:
    [ { 'total' => '12.00', 'currency' => 'CAD', 'fee_lines' => '', 'created_at' => '2019-03-28 10:33:06', 'total_shipping' => '0.00', 'customer_id' => '1', 'order_number' => '359', 'total_discount' => '0.00', 'shipping_lines' => { 'method_id' => 'local_pickup', 'total' => '0.00', 'id' => '309', 'method_title' => 'Pickup' }, 'billing_address' => { 'phone' => '5065551212', 'city' => 'Moncton', 'province_code' => 'NB', 'country' => 'Canada', 'address_2' => '', 'last_name' => 'Blow', 'email' => 'someone@somewhere.com', 'country_code' => 'CA', 'province' => 'New Brunswick', 'postal_code' => 'E4E 4E4', 'address_1' => '123 Somestreet', 'first_name' => 'Joe', 'company_name' => '' }, 'shipping_address' => { 'province' => 'New Brunswick', 'country_code' => 'CA', 'postal_code' => 'E4E 4E4', 'address_1' => '123 Somestreet', 'first_name' => 'Joe', 'company_name' => '', 'city' => 'Moncton', 'province_code' => 'NB', 'country' => 'Canada', 'address_2' => '', 'last_name' => 'Blow' }, 'line_items' => [ { 'subtotal' => '10.00', 'total_tax' => '0.00', 'subtotal_tax' => '0.00', 'id' => '307', 'quantity' => '1', 'variation_id' => '0', 'tax_class' => '', 'sku' => '', 'total' => '10.00', 'product_thumbnail_url' => 'https://www.mysite +.com/wp-content/uploads/2019/03/pepperoni-pizza-150x150.jpg', 'name' => 'Pepperoni Pizza', 'meta' => '', 'product_url' => 'https://www.mysite.com/produ +ct/pepperoni-pizza/', 'product_id' => '15', 'price' => '10.00' }, { 'total_tax' => '0.00', 'subtotal' => '2.00', 'quantity' => '2', 'variation_id' => '0', 'id' => '308', 'subtotal_tax' => '0.00', 'total' => '2.00', 'product_thumbnail_url' => 'https://www.mysite +.com/wp-content/uploads/2019/03/pepsi-can-150x150.jpg', 'sku' => '', 'tax_class' => '', 'price' => '1.00', 'product_id' => '222', 'product_url' => 'https://www.mysite.com/produ +ct/pepsi/', 'meta' => '', 'name' => 'Pepsi' } ], } ]
Re: XML to Hash Truncating Keys Problem
by hdb (Monsignor) on Mar 28, 2019 at 15:22 UTC

    I am a bit confused by your logic here. First you apply the toHash function which returns a hash reference. But then you use Dumper to stringify it, this string is the output of your convert_xml_to_hash function. Upon return from your function, you eval it, which gives you (hopefully) a hash reference again. Then you use Dumper again to print it.

    Have you tried to just use the output of the toHash function directly? With so much back and forth between hash and string, it is no wonder that Perl loses a few characters...

      With so much back and forth between hash and string, it is no wonder that Perl loses a few characters...

      I agree with all of your post except for the last sentence, I disagree with it. Data::Dumper wouldn't just lose characters like that....

Re: XML to Hash Truncating Keys Problem
by hdb (Monsignor) on Mar 28, 2019 at 15:24 UTC

    While the usage of XML::Simple is discouraged even by its author, it should do exactly what you want here.

      It would. The catch is that the data structure would be inconsistent and fragile. See Simpler than XML::Simple.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

Re: XML to Hash Truncating Keys Problem
by Wayne (Novice) on Mar 29, 2019 at 13:20 UTC

    Update:

    The solution that I have found is much simpler and elegant and doesn't appear to produce any errors in the output. It uses the XML::Fast library:

    use XML::Fast; my($hash)=xml2hash($xml);

    Cheers!

Re: XML to Hash Truncating Keys Problem
by Wayne (Novice) on Mar 28, 2019 at 21:01 UTC

    Hello, and thank you all for the many prompt replies! :)

    Roboticus- "Perhaps you have either (a) some wonky data, or (b) a bit of code you've not shown us is frobnicating it somewhere."

    Yes! Perhaps both! I have a PHP function that generates XML data from a Woocommerce(Wordpress) database. I validated the output using several online XML validation tools and they all reported valid XML output. Thus I assumed that the PHP function was working properly.

    hdb- "Have you tried to just use the output of the toHash function directly? With so much back and forth between hash and string, it is no wonder that Perl loses a few characters..."

    No not yet, but thanks for the suggestion!

    "While the usage of XML::Simple is discouraged even by its author, it should do exactly what you want here."

    I initially wrote it using XML::Simple but upon discovering that it wasn't recommended anymore I decided to look for a more robust alternative.

    "I am a bit confused by your logic here....Then you use Dumper again to print it."

    Sometimes my logic can be a little fuzzy but no worries - it's as clear as mud.... And Using Dumper to print again was just for debugging. There are actually container variables storing hash values but they weren't relevant to the problem so I didn't include them.

    Haukex - "Data::Dumper wouldn't just lose characters like that...."

    I agree. I believe that the problem most likely exists between the keyboard and the chair (yours truly!) and not the module itself.

    Anonymous- "valid XML data - show it please"

    Yes, I should have posted the XML in the OP. Since it was validating elsewhere and was long and ugly I didn't bother. I've posted it below. The only changes are to some PII - IP address, domain name and email address. I also posted the output, below the XML.

    Thanks again!

    my($xml)='<?xml version="1.0"?> <root><order><id>359</id><order_number>359</order_number><created_at>2 +019-03-28 10:33:06</created_at><updated_at>2019-03-28 10:33:06</updat +ed_at><completed_at/><status>pending</status><currency>CAD</currency> +<total>12.00</total><subtotal>12.00</subtotal><total_line_items_quant +ity>3</total_line_items_quantity><total_tax>0.00</total_tax><total_sh +ipping>0.00</total_shipping><cart_tax>0.00</cart_tax><shipping_tax>0. +00</shipping_tax><total_discount>0.00</total_discount><shipping_metho +ds>Pickup</shipping_methods><order_key>wc_order_n13C7qFcbVcbI</order_ +key><payment_details><method_id>cop</method_id><method_title>Pay at P +ickup</method_title><paid_at/></payment_details><billing_address><fir +st_name>Joe</first_name><last_name>Blow</last_name><company_name/><ad +dress_1>123 Somestreet</address_1><address_2/><city>Moncton</city><pr +ovince_code>NB</province_code><province>New Brunswick</province><post +al_code>E4E 4E4</postal_code><country_code>CA</country_code><country> +Canada</country><email>someone@somewhere.com</email><phone>5065551212 +</phone></billing_address><shipping_address><first_name>Joe</first_na +me><last_name>Blow</last_name><company_name/><address_1>123 Somestree +t</address_1><address_2/><city>Moncton</city><province_code>NB</provi +nce_code><province>New Brunswick</province><postal_code>E4E 4E4</post +al_code><country_code>CA</country_code><country>Canada</country></shi +pping_address><note/><customer_ip>127.0.0.1</customer_ip><customer_us +er_agent>Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 +(KHTML, like Gecko) Chrome/127.0.0.1 Safari/537.36</customer_user_age +nt><customer_id>1</customer_id><view_order_url>https://www.mysite.com +/my-account/view-order/359/</view_order_url><line_items><id>307</id>< +subtotal>10.00</subtotal><subtotal_tax>0.00</subtotal_tax><total>10.0 +0</total><total_tax>0.00</total_tax><price>10.00</price><quantity>1</ +quantity><tax_class/><name>Pepperoni Pizza</name><product_id>15</prod +uct_id><variation_id>0</variation_id><product_url>https://www.mysite. +com/product/pepperoni-pizza/</product_url><product_thumbnail_url>http +s://www.mysite.com/wp-content/uploads/2019/03/pepperoni-pizza-150x150 +.jpg</product_thumbnail_url><sku/><meta/><id>308</id><subtotal>2.00</ +subtotal><subtotal_tax>0.00</subtotal_tax><total>2.00</total><total_t +ax>0.00</total_tax><price>1.00</price><quantity>2</quantity><tax_clas +s/><name>Pepsi</name><product_id>222</product_id><variation_id>0</var +iation_id><product_url>https://www.mysite.com/product/pepsi/</product +_url><product_thumbnail_url>https://www.mysite.com/wp-content/uploads +/2019/03/pepsi-can-150x150.jpg</product_thumbnail_url><sku/><meta/></ +line_items><shipping_lines><id>309</id><method_id>local_pickup</metho +d_id><method_title>Pickup</method_title><total>0.00</total></shipping +_lines><tax_lines/><fee_lines/><coupon_lines/></order></root>';
    { 'root' => { 'order' => { 'order_number' => '359', 'created_at' => '2019-03-28 10:33:06', 'total' => '12.00', 'total_shipping' => '0.00', 'tax_line' => '', 'total_discount' => '0.00', 'shipping_lines' => { 'total' => '0.00', 'method_title' => 'Pi +ckup', 'id' => '309', 'method_id' => 'local +_pickup' }, 'billing_address' => { 'email' => 'someone@ +somewhere.com', 'province_code' => ' +NB', 'city' => 'Moncton', 'address_' => '', 'postal_code' => 'E4 +E 4E4', 'country' => 'Canada +', 'address_1' => '123 +Somestreet', 'company_nam' => '', 'country_code' => 'C +A', 'last_name' => 'Blow +', 'phone' => '50655512 +12', 'first_name' => 'Joe +', 'province' => 'New B +runswick' }, 'payment_details' => { 'paid_a' => '', 'method_id' => 'cop' +, 'method_title' => 'P +ay at Pickup' }, 'id' => '359', 'cart_tax' => '0.00', 'shipping_tax' => '0.00', 'order_key' => 'wc_order_n13C7qFcbVcbI', 'shipping_methods' => 'Pickup', 'coupon_line' => '', 'subtotal' => '12.00', 'line_items' => { 'product_url' => [ 'https +://www.mysite.com/product/pepperoni-pizza/', 'https +://www.mysite.com/product/pepsi/' ], 'name' => [ 'Pepperoni Pi +zza', 'Pepsi' ], 'tax_clas' => '', 'product_thumbnail_url' = +> [ + 'https://www.mysite.com/wp-content/uploads/2019/03/pepperoni-pizz +a-150x150.jpg', + 'https://www.mysite.com/wp-content/uploads/2019/03/pepsi-can-150x +150.jpg' + ], 'subtotal_tax' => [ '0.00 +', '0.00 +' ], 'subtotal' => [ '10.00', '2.00' ], 'total' => [ '10.00', '2.00' ], 'price' => [ '10.00', '1.00' ], 'id' => [ '307', '308' ], 'met' => '', 'variation_id' => '0', 'quantity' => [ '1', '2' ], 'total_tax' => [ '0.00', '0.00' ], 'product_id' => [ '15', '222' ], 'sk' => '' }, 'completed_a' => '', 'status' => 'pending', 'updated_at' => '2019-03-28 10:33:06', 'customer_ip' => '127.0.0.1', 'not' => '', 'customer_user_agent' => 'Mozilla/5.0 (Wind +ows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome +/127.0.0.1 Safari/537.36', 'shipping_address' => { 'company_nam' => '' +, 'postal_code' => 'E +4E 4E4', 'country' => 'Canad +a', 'address_1' => '123 + Somestreet', 'first_name' => 'Jo +e', 'province' => 'New +Brunswick', 'last_name' => 'Blo +w', 'country_code' => ' +CA', 'province_code' => +'NB', 'city' => 'Moncton' +, 'address_' => '' }, 'total_tax' => '0.00', 'fee_line' => '', 'total_line_items_quantity' => '3', 'customer_id' => '1', 'currency' => 'CAD', 'view_order_url' => 'https://www.mysite.com +/my-account/view-order/359/' } }, 'xml' => { 'version' => '1.0' } }

      Wayne:

      It looks like it may be a problem in XML::Mini::Document. When I changed one of my items in my XML document from <address_2></address_2> to <address_2/> as shown in yours, it promptly lost the "2" from the end of the name.

      Edit: I had some slashes in the wrong places.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

Re: XML to Hash Truncating Keys Problem
by Anonymous Monk on Mar 28, 2019 at 14:39 UTC
    valid XML data - show it please
Re: XML to Hash Truncating Keys Problem
by Wayne (Novice) on Mar 28, 2019 at 14:23 UTC

    Correction - The problem is not intermittent. It happens everytime.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1231805]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2021-05-15 14:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Perl 7 will be out ...





    Results (150 votes). Check out past polls.

    Notices?