Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Help with XML Parsing

by dbonneville (Acolyte)
on Jul 20, 2007 at 20:21 UTC ( #627875=perlquestion: print w/replies, xml ) Need Help??

dbonneville has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I'm a brand new to Perl, but not to programming.

So far, I'm able to read in and parse and XML doc to the screen, but I'm currently at a loss as what to do next. I've been searching for some help, but am a little directionless as all-things-Perl is too new for me yet. I probably passed by some good help and didn't know it already...

Here what I'm trying to do:

- Load in an XML file
- Parse the file and look for content in it (recordsets)
- Take those records that match certain criteria and
WRITE them to a new XML file on the server.
- Do this about 30 times, for different criteria, from the master XML file from the first step.


As of yet, I can't find a way to do this. I'm not looking for hand-holding, but some good info or tips or tutorials to get me going, so I can then formulate the next intelligent question.

I started from ground zero today, with utterly no knowledge of Perl. I got it running on my PC and got this far (which to me was a LOT):

#!/usr/local/bin/perl -w use warnings; use XML::Parser; my $p1 = new XML::Parser(Style => 'Debug'); $p1->parsefile('test.xml');

Thanks,

Doug

Replies are listed 'Best First'.
Re: Help with XML Parsing
by GrandFather (Saint) on Jul 20, 2007 at 22:07 UTC

    Generally XML strongly implies "use XML::Twig". However, unless the XML you want to process is very large, in this case I'd suggest XML::TreeBuilder is most appropriate. In particular take a look a the look_down method of HTML::Element. Consider:

    use strict; use warnings; use XML::TreeBuilder; my $xml = <<XML; <root> <a attr='1'>Contents of a - 1</a> <b>Contents of b</b> <a attr='2'>Contents of a - 2</a> <c>Contents of c</c> <d>Contents of d</d> </root> XML my $root = XML::TreeBuilder->new (); $root->parse ($xml); print "All the 'a' elements\n"; for my $elt ($root->look_down ('_tag' => 'a')) { print $elt->as_XML (), "\n"; } print "All the 'b' elements\n"; for my $elt ($root->look_down ('_tag' => 'b')) { print $elt->as_XML (), "\n"; } print "All the 'a' elements with attr='2'\n"; for my $elt ($root->look_down ('_tag' => 'a', 'attr' => '2')) { print $elt->as_XML (), "\n"; }

    Prints:

    All the 'a' elements <a attr="1">Contents of a - 1</a> <a attr="2">Contents of a - 2</a> All the 'b' elements <b>Contents of b</b> All the 'a' elements with attr='2' <a attr="2">Contents of a - 2</a>

    DWIM is Perl's answer to Gödel
Re: Help with XML Parsing
by un-chomp (Scribe) on Jul 20, 2007 at 21:26 UTC
    I'd use XML::LibXML, and select the nodes you want using XPath.
    #!/usr/bin/perl use strict; use warnings; use XML::LibXML; my $parser = XML::LibXML->new; my $dom = $parser->parse_file( 'test.xml' ); # load file my @wanted_nodes = $dom->findnodes( './/an/xpath/here' ); # select print $_->toString for @wanted_nodes; # output
    merlyn has written a column on using XML::LibXML with HTML, which may be of interest.
      Thanks! I got XML::LibXML working just fine! I was able to use XPath to get the info out of the master XML, and then print new XML files. Works like a charm.
Re: Help with XML Parsing
by runrig (Abbot) on Jul 20, 2007 at 20:39 UTC
    One thing I like about XML::Twig is the plethora of docs (see the SEE ALSO section) and how easy it is to use (ok, that's TWO things :-)
Re: Help with XML Parsing
by CountZero (Bishop) on Jul 20, 2007 at 21:10 UTC
    Transforming an XML-file into another XML-file is something that you would typically do through XSLT.

    Some Perl modules which can do this are XML::XSLT (does not yet fully implement all XSLT-functions), XML::Sablotron (needs a separate install of Sablotron) or XML::LibXSLT (needs access to the Gnome project libxslt library).

    XML::Sablotron is probably your best bet. I remember having installed it some years ago in an Apache webserver on Windows and it worked flawless.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      That is if you're a masochist. Fiddling with XSLT, especially if you need 30 different conditions for the filtering is definitely going to be lots of fun. Transforming XML into XML using XML containing commands in XML and snippets of XML to XML xml the xml in xml xmling the Xml xML xml xml mxml xmls xml xml xml xml </xml>

Re: Help with XML Parsing
by mirod (Canon) on Jul 20, 2007 at 21:58 UTC

    If your criteria can be expressed using XPath, then you might be able to use xml_grep2, a tool based on XML::LibXML that you can find at http://xmltwig.com/tool/.

    And of course XML::Twig comes with its own xml_grep.

Re: Help with XML Parsing
by rvosa (Curate) on Jul 21, 2007 at 13:44 UTC
    Welcome on board, hope you'll like perl! I'll have to agree with others recommending XML::Twig.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://627875]
Front-paged by tye
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2022-05-24 09:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (82 votes). Check out past polls.

    Notices?