Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Module for XML output

by rebugger (Acolyte)
on Jan 10, 2012 at 18:27 UTC ( [id://947212]=perlquestion: print w/replies, xml ) Need Help??

rebugger has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I'm a module novice, for the most part. I'm looking through the plethora of XML modules on CPAN, and thought I might see if anyone had suggestions. I need a module for building XML files from scratch. It (probably) doesn't need to read or modify existing XML files. My input is flat text files, and my program will only read them, build some kind of data structure, and output 1 largish (~5-10MB) XML file. I am looking for a module where building the XML data structure is as headache-free as possible, and of course one that can handle the file size (if that is even an issue at this size). Any recommendations are appreciated, and meanwhile I'll keep looking through CPAN. Thanks.

~rebugger~

*Edit: Thanks everyone for your suggestions. I'm checking them all out. I'll add a little extra information to the mix so people still wanting to add anything don't have to do as much of the "if this is what you want, then..." game. The XML has to be a certain format. I have a fixed DTD file I need to conform to (no XSL unless I write one, and it'd probably only be used in my program). The XML is the input for catalog software. Order of elements is important (it is kind of HTML-like) so I don't want it re-sorting. I am taking CSV files containing product data and building them into a catalog. This only needs to be done once, as it will be loaded into our catalog software database and edited in the software. The only data is text and numeric, nothing binary or otherwise fancy.

Ex. XML:

<catalog media_no="123456AB"> <title>Tool Catalog</title> <section id="s1000001" sort="1000001"> <title>Section 1</title> <section id="s0000001" sort="1"> <title>Hoses</title> <article id="a6543210"> <title_1>Air Hose</title_1> <title_info_1>Warranty: Nine Months</title_info_1> <list style="bulleted"> <title>Air Hose</title> <list_item>Rugged bend restrictor</list_item> <list_item>Max working pressure: 350 PSI</list_item> </list> </article> </section> </section> </catalog>

Replies are listed 'Best First'.
Re: Module for XML output
by Lotus1 (Vicar) on Jan 10, 2012 at 18:41 UTC

    XML::Writer, Using Perl to create XML. Wow, that node is 11+ years old. XML::LibXML will do it too. Haven't tried XML::Twig or XML::Simple for this.

Re: Module for XML output
by Jenda (Abbot) on Jan 10, 2012 at 19:48 UTC

    Depends on how twisted the XML has to be or how exactly is it defined at the moment.

    You might even just tweak the data structure a bit and have it output by XML::Rules. Whether that will be convenient or not depends on the XML schema.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Re: Module for XML output
by saberworks (Curate) on Jan 10, 2012 at 20:19 UTC

    I wrote this originally on blogs.perl.org: Using XML::Compile to output XSD compliant XML. It may help if you decide to make an XSD file.

    As part of a recent project I was given an XSD file (xml schema definition) and asked to output compliant XML. CPAN to the rescue. I found XML::Compile::Schema which is a cool module that allowed me to do this with very little fuss. The documentation is really good but I think a tutorial-style post might be helpful.

    To do this you’ll need to install XML::Compile and XML::LibXML.

    You can use XML::Compile::Schema to read in your xsd file and output a perl hash template. Then you can use that example template to construct a hash of real data and then have XML::Compile::Schema output a valid XML file.

    For this tutorial, download a sample .xsd file from here. Then write a perl script like so to dump a perl hash template.

    #!/usr/local/bin/perl use warnings; use strict; use Data::Dumper; use XML::Compile::Schema; use XML::LibXML::Reader; my $xsd = 'test.xsd'; my $schema = XML::Compile::Schema->new($xsd); # This will print a very basic description of what the schema describe +s $schema->printIndex(); # this will print a hash template that will show you how to construct +a # hash that will be used to construct a valid XML file. # # Note: the second argument must match the root-level element of the X +ML # document. I'm not quite sure why it's required here. warn $schema->template('PERL', 'addresses');

    The relevant output looks like this:

    # is an unnamed complex { # sequence of address # is an unnamed complex # occurs 1 <= # <= unbounded times address => [ { # sequence of name, street # is a xs:string # is optional name => "example", # is a xs:string # is optional street => "example", }, ], }

    The comments are helpful (and were provided by XML::Compile::Schema directly, not by me). It basically says your data structure should start as a hashref which should contain an entry called “address” which is a reference to an array. This array should be a list of hash references which each contain two elements, name and street.

    From this you can deduce that a valid hash will look something like this.

    my $data = { address => [ { name => 'name 1', street => 'street 1', }, { name => 'name 2', street => 'street 2', } ], };

    In order to output the XML, you have to do this:

    my $doc = XML::LibXML::Document->new('1.0', 'UTF-8'); my $write = $schema->compile(WRITER => 'addresses'); my $xml = $write->($doc, $data); $doc->setDocumentElement($xml); print $doc->toString(1); # 1 indicates "pretty print"

    My output looks like this:

    <?xml version="1.0" encoding="UTF-8"?> <addresses> <address> <name>name 1</name> <street>street 1</street> </address> <address> <name>name 2</name> <street>street 2</street> </address> </addresses>

    The actual XSD and resulting XML files I was dealing with were much more complicated but I followed this process and had no trouble whatsoever.

Re: Module for XML output
by tobyink (Canon) on Jan 10, 2012 at 21:04 UTC

    There is rarely a reason to not use XML::LibXML.

    That said, if you're just writing XML, and the structure isn't especially complicated, it's often quite easy to do so with string concatenation, provided you have a function to entity-escape strings (and the numeric escape from HTML::Entities will do the trick just fine).

Re: Module for XML output
by ambrus (Abbot) on Jan 11, 2012 at 14:08 UTC

    You can use XML::Twig to create new XML files. There's not many example for this around, probably because XML::LibXML is for those who believe in XML, and XML::Twig is for those who don't but have to work with it, and obviously it's more frequently the former who want to create new XML. (Update 2011-01-13: on the other hand, code for building XML from scratch with XML::LibXML can look a bit ugly, because XML::LibXML is a straight wrapper over the C library, so it won't give you methods that accept a variable number and type of arguments like the constructors I'm using below.)

    Anyway, there are two styles you can use to create new XML with Twig (though you can mix them). Here.

    use warnings; use strict; use XML::Twig; { # from the inside my $para1text1 = "There are two ways to build XML with "; my $moduletext = "Twig"; my $module = XML::Twig::Elt->new("a", {"href" => "http://mirod +.org/"}, $moduletext); my $para1text2 = ": "; my $para1 = XML::Twig::Elt->new("p", $para1text1, $module, $p +ara1text2); my $para2text = "from the inside and from outside."; my $para2 = XML::Twig::Elt->new("p", $para2text); my $root = XML::Twig::Elt->new("saying", $para1, $para2); my $twig = XML::Twig->new(pretty_print => "nice"); $twig->se +t_root($root); $twig->flush(*STDOUT); } { # from the outside my $twig = XML::Twig->new(pretty_print => "nice"); my $root = XML::Twig::Elt->new("saying"); $twig->set_root($root); my $para1 = $root->insert_new_elt(last_child => "p"); $para1->suffix("There are two ways to build XML with "); my $module = $para1->insert_new_elt(last_child => "a", {"href" => "http://mirod.org/"}); # or this would work too: #$module = $para1->insert_new_elt(last_child => "a"); #$module->set_att("href" => "http://mirod.org/"); $module->suffix("Twig"); $para1->suffix(": "); my $para2 = $root->insert_new_elt(last_child => "p"); $para2->suffix("from the inside and from outside."); $twig->flush(*STDOUT); }

    If you don't wish to keep the whole XML structure in memory, you have to use the second method for at least the outer layers. After adding each larger chunk of the document (here after adding each paragraph), you call the flush method on that element which both outputs the XML document up to that part, and removes it from the document tree so it's no longer in the memory. Just don't forget to flush the twig at the very end so that the last closing tags are output. For example,

    { # from the outside, flushing after each paragraph my $twig = XML::Twig->new(pretty_print => "nice"); my $root = XML::Twig::Elt->new("saying"); $twig->set_root($root); my $para1 = $root->insert_new_elt(last_child => "p"); $para1->suffix("There are two ways to build XML with "); my $module = $para1->insert_new_elt(last_child => "a", {"href" => "http://mirod.org/"}); $module->suffix("Twig"); $para1->suffix(": "); $para1->flush(*STDOUT); my $para2 = $root->insert_new_elt(last_child => "p"); $para2->suffix("from the inside and from outside."); $para2->flush(*STDOUT); $twig->flush(*STDOUT); } { # combination, flusing after each paragraph my $twig = XML::Twig->new(pretty_print => "nice"); my $root = XML::Twig::Elt->new("saying"); $twig->set_root($root); my $para1text1 = "There are two ways to build XML with "; my $moduletext = "Twig"; my $module = XML::Twig::Elt->new("a", {"href" => "http://mirod +.org/"}, $moduletext); my $para1text2 = ": "; my $para1 = XML::Twig::Elt->new("p", $para1text1, $module, $p +ara1text2); $para1->paste($root); $para1->flush; my $para2text = "from the inside and from outside."; my $para2 = XML::Twig::Elt->new("p", $para2text); $para2->paste($root); $para2->flush; $twig->flush(*STDOUT); }

    Update 2013-10-21: see the later question Best module for Creating [Writing out] XML.

      The hairiest, albeit perhaps most complete expression of a problem like this one .. appropriate perhaps only for fighting the biggest fires .. is to use a parser and to regard the XML output step as akin to creating an abstract-syntax tree (AST) in a true compiler-like situation.   Although the documentation for Parse::RecDescent was (to me, at least) rather baffling to read, it does specifically address this case.   I am frankly not suggesting it here, with regard to this post, but in those cases when you are dealing with a truly-nasty input file I can speak from personal experience that this approach does make a dramatically good result.   (I once used it to rip into a collection of about 6,000 fairly unpredictable SAS® files, korn-shell scripts, and Tivoli Workload Scheduler® scripts to construct a data-flow model of the entire system.   It was quite the beast to do, and there would have been no other way, I think, to have done it.)

Re: Module for XML output
by sundialsvc4 (Abbot) on Jan 10, 2012 at 19:49 UTC

    A size of “5 to 10 megabytes,” i-f it can comfortably be projected never to grow unreasonably bigger than that, definitely qualifies as something that is not too interesting.   You can simply do the whole task in memory.   Perhaps your file-parsing code simply constructs a “hash of hashes” structure in memory, then you use any one of several available XML routines (in this case, I would say to keep it XML::Simple ...) to write out that structure as XML.   Poof.   Done.

    If the processing of input data is “hairy and difficult,” a true parser such as Parse::RecDescent can (after(!) the learning-curve beast is slain...) take you anywhere that you want to go.   But otherwise don’t borrow trouble.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://947212]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2024-04-18 10:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found