Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: easy way of parsing XML

by BrowserUk (Pope)
on Jun 29, 2002 at 17:45 UTC ( #178249=note: print w/ replies, xml ) Need Help??


in reply to easy way of parsing XML

This is a little program I wrote to help me A) get to grips with Perl references. B) Understand the stuctures returned by XMLIn.

It may give you a place to start.

#!e:/Perl/bin/Perl.exe -w use strict; use warnings; use XML::Simple; my $file = shift or die <<USAGE; Usage: $0 <xmlfile> [label] [0|1] Where; <xmlfile> is the path/name of the .xml file to parse. [label] is the base of the references as output (default:'\$xml->' +). [0|1] indicate whether to append the contents of the tags. (defaul +t:0 (no)). Parses an XML file using XML::Simple and (if the doc is well-forme +d) outputs the perl references to access the elements of the structur +e, optionally appending the contents of the tags. USAGE my $xml = XMLin( $file, parseropts => [ ErrorContext => 1, ], forcearr +ay => 0, ); sub walk { my ($label, $valuesFlag) = (shift||"xml->", shift||0); my ($output, $tab) = ( "", ". "); foreach my $thing (shift) { if ( ref($thing) eq "HASH" ) { $output .= ( !ref ${$thing}{$_} ) ? "$label\{$_\}" . ( $valuesFlag ? " := ${$thing}{$_}\n" : "\n" ) : walk( $label . "{$_}", $valuesFlag, ${$thing} +{$_} ) foreach ( keys %$thing ); } if ( ref($thing) eq "ARRAY" ) { $output .= ( !ref @{$thing}[$_] ) ? "$label\[$_\] " . ( $valuesFlag ? " := @{$thing}[$_]\n" : "\n" ) : walk( $label . "[$_]", $valuesFlag, @{$thing}[$_ +] ) foreach ( 0.. @{$thing} -1 ); } $output .= " := $thing\n" if ( ref $thing eq "SCALAR" ); } return $output; } (my $label= shift) =~ s/^(.+)+$/$1->/; #print $label . "\n"; print walk $label, shift||0, $xml;
When I run this on your sample XML using the following command line:

xmlrefs.pl C:\test\items.xml "" 1 >itemlist.refs

where items.xml is your sample XML; "" is a place holder for an optional alternative base tag (I know getopt::std/long!); and the "1" indicates I want to see the contents of the tags as well.

Gives the following output.

xml->{Category1} := General xml->{Origin} := Canada xml->{Description} := A wonderful table. xml->{DlrItemNum} := 1 xml->{Category2} := Old Products xml->{Length} := 5.18 xml->{Title} := Table xml->{Width} := 3.56 xml->{Circa} := Early 20th Century xml->{Dimensions} := 3.56 x 5.18m

I find that I tend to use a label (second param) of the form "#xml->" and append the output to the end of my perl script for easy reference. Specifying 0 (or omitting) the 3 parameter so that you don't get the contents of the tags make cut&paste even easier.

Update:Corrected SCALER=>SCALAR spelling. (Here, and in my copy of the prog:).


Comment on Re: easy way of parsing XML
Select or Download Code
Re: Re: easy way of parsing XML
by grantm (Parson) on Jun 30, 2002 at 08:39 UTC

    XML::Simple tries to provide a simple interface but it does assume a knowledge of Perl references. I recommend perlreftut.

    The most common mistake with XML::Simple is to ignore the advice in the docs about the ForceArray and KeyAttr options. Always set ForceArray => 1 if you're not sure, and setting it to an array of element names is probably the best way.

    I also recommend setting KeyAttr => [] unless you know what you want. In the case of the original XML snippet, KeyAttr =>  'DlrItemNum' might be useful.

    I suspect the line that says if ( ref $thing eq "SCALER" ); will never get executed.

    If you're processing big XML files, XML::SAX might be a good answer. XML::SAX::ByRecord from Barrie Slaymaker's XML::SAX::Machines could be very handy once you have your head around SAX.

    But, XML::Twig is possibly the best answer for simple record oriented processing.

    Edit: Sorry, I typed 'always set ForceArray => 0 ...' when I meant 'ForceArray => 1 ...'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://178249]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2015-07-07 02:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (86 votes), past polls