Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

How to create XML tree from non-XML source

by H4 (Acolyte)
on Sep 08, 2008 at 12:51 UTC ( [id://709747] : perlquestion . print w/replies, xml ) Need Help??

H4 has asked for the wisdom of the Perl Monks concerning the following question:

I want to process tree-structured data which is not XML but can easily be converted into XML. I want to make this conversion on the fly, creating XML nodes and adding them to the in-memory structure. Then, I want to use XPath expressions to access several nodes, modify them, and write the whole thing out into an XML file.

I read several XML::* manpages, and even the 'Perl&XML' book, but cannot find a place to start with. I was imagining something like my $tree = XML::some_class->new; my $new_node = $tree->new_node('node_tag', 'node_text'); $known_node->add_child($new_node), and then feeding $tree into an XML::XPath instance. Or, I might consider generating SAX events while reading my data source which would magically create the data structure in memory.

In fact, I already wrote code which creates a tree-like structure in memory, so it would be easy to turn it into a data structure which XML::XPath understands, but I cannot find the spec of what is required on XPath's side.

Almost everything I've found so far concentrates on parsing XML files, or on writing handlers for SAX events. What I found on SAX drivers was not too instructive either. A code sample would be great, but a link to further reading will do.

Replies are listed 'Best First'.
Re: How to create XML tree from non-XML source
by dHarry (Abbot) on Sep 08, 2008 at 15:17 UTC

    There are many Perl modules available for generating/creating XML: Any::Renderer::XML (generates "element only" XML), XML::Generator or XML::Writer to name a few. It depends a bit on what you really need in terms of XML features and how far you wanna push it.

    For xpath you can use XML::XPath.

    I am using XML::Twig a lot lately. It is turning into a one-solution-for-all-xml-problems for me:-)

    Hope this helps

      Thank you for your suggestions. XML::Generator looks good. But how can I manipulate a tree created with XML::Generator?

        XML::Generator is really intended for converting existing data structures to XML. If you want to manipulate them a bit before outputting them, I'm going to second the recommendation for XML::Twig, which is easy to use and fairly well documented.

        Here's a quickie example for you, though there are better ones at the link above:

        #!/usr/bin/perl use strict; use warnings; use XML::Twig; my $twig; my $root = "<nodetag />"; my $element; my $firstelem; my $childcnt; $twig = XML::Twig->new( output_encoding => 'utf8', pretty_print => 'record'); # $root is a string containing the starting tag $twig->parse($root); $root = $twig->root; # $root is now the root twig element, and we can modify it $root->set_gi('nodetag_root'); # We can add children to it foreach $childcnt (0 .. 10) { $element = XML::Twig::Elt->new('childtag' => 'child text'); $element->set_att('index',$childcnt); $element->paste('last_child',$root); } # We can modify an arbitrary child $element = $root->first_child('childtag[@index="5"]'); $element->set_text('Number Five, alive!'); # And we can print it, to a filehandle if necessary $twig->print;
        It outputs:
        <?xml version="1.0" encoding="utf8"?> <nodetag_root> <childtag index="0">child text</childtag> <childtag index="1">child text</childtag> <childtag index="2">child text</childtag> <childtag index="3">child text</childtag> <childtag index="4">child text</childtag> <childtag index="5">Number Five, alive!</childtag> <childtag index="6">child text</childtag> <childtag index="7">child text</childtag> <childtag index="8">child text</childtag> <childtag index="9">child text</childtag> <childtag index="10">child text</childtag> </nodetag_root>
Re: How to create XML tree from non-XML source
by themage (Friar) on Sep 08, 2008 at 14:16 UTC
    Hi H4,

    You can try to use XML::Simple' XMLout as long as you have a perl hash representing the data you want to write to XML.
    use XML::Simple qw(XMLout); my $data={book=>[{name=>"test",author=>"H3"},{name=>"test2",author=>"H +4"}]}; print XMLout($data,NoAttr=>1,RootName=>"books");
      Thanks for your input. Unfortunately, XML::Simple does not preserve the ordering of subnodes because it uses hashes rather than lists. In your example, there is no way of telling whether <name> or <author> should appear first in the resulting XML. Sorry I forgot to mention that, in my case, order does matter.
Re: How to create XML tree from non-XML source
by GrandFather (Saint) on Sep 08, 2008 at 22:22 UTC

    This looks somewhat like the wrong question. XML is a file based representation of some data. XPath is a protocol for locating information in an XML file. Neither imply any particular internal representation during processing.

    So, what is the bigger picture? What input data have you and what do you want to achieve with it?

    Perl reduces RSI - it saves typing

      My original data is genealogical data in GEDCOM format. GEDCOM is a well-documented standard, yet every GEDCOM-able software creates files that, in one way or another, violate that standard. My idea is to create an intermediate form which can be converted to and from all involved 3rd party GEDCOM styles. I chose XML because GEDCOM is a tree structure, and I thought it is better to use existing tools for manipulating trees than to re-invent them.

      Yes, I know there is a Gedcom package on CPAN, but it cannot read 5 out of 9 test files, and does not handle character sets correctly.

      I want to use XPath expressions to locate the nodes which must be modified, then modify them as required, then save the tree to an XML file. I don't mind saving the unmodified XML tree to an intermediate file if I must. But then, using XPath to locate a node, how do I do my modification? This may include renaming the node's type, changing the text, moving the node up in the tree, or creating subnodes. Are XML and XPath the wrong tools? Maybe I'll have to create my own code to locate nodes, rather than using XPath?

        XML is in essence a file format. It is not generally used as an in memory representation of data from some other file format. Unless you want to store an intermediate form of the data on disk in some non-GEDCOM format XML is not appropriate. Even then, you would probably be better to store any intermediate form of the data on disk as a clean GEDCOM file (although, see below).

        There are many ways to handle trees in Perl (see tree), but probably you are better to write a GEDCOM object hierarchy that directly addresses the structure you need to manipulate.

        I note that GEDCOM 6.0 will be an XML based file, but that needn't alter how you internally represent the data. In fact whatever internal representation you choose now should be completely independent of the external representation and should be chosen to facilitate the creation and manipulation of the internal representation. Then it becomes fairly easy to handle different input file formats and generate different output file formats.

        Perl reduces RSI - it saves typing
Re: How to create XML tree from non-XML source
by GrandFather (Saint) on Sep 10, 2008 at 02:22 UTC

    You may find the following interesting to ponder:

    use strict; use warnings; use Tree::DAG_Node; my $root = Tree::DAG_Node->new (); my $level = 0; my $currMother = $root; while (<DATA>) { chomp; s/^\s+//; my ($lineLevel, $tag, $tail) = split ' ', $_, 3; my $newDaughter; while ($lineLevel < $level) { $currMother = $currMother->mother (); --$level; } if ($lineLevel > $level) { $newDaughter = $currMother = $currMother->new_daughter (); die "Adjacent lines differ by more than one level at line $." if ++$level != $lineLevel; } $newDaughter = $currMother->new_daughter () unless $newDaughter; $newDaughter->name ($tag); $newDaughter->attribute ()->{data} = $tail; } print "<root>\n"; $root->walk_down ({callback => \&enterNode, callbackback => \&exitNode +, _depth => 0}); print "</root>\n"; sub enterNode { my ($node, $options) = @_; return 1 if ! defined $node->{name}; print ' ' x ($options->{_depth} * 3); print "<$node->{name}>"; print $node->attribute ()->{data} if defined $node->attribute ()-> +{data}; print "\n"; return 1; } sub exitNode { my ($node, $options) = @_; return if ! defined $node->name (); print ' ' x ($options->{_depth} * 3); print "</$node->{name}>\n"; } __DATA__ 0 HEAD 1 SOUR Reunion 2 VERS V8.0 2 CORP Leister Productions 1 DEST Reunion 1 DATE 11 FEB 2006 1 FILE test 1 GEDC 2 VERS 5.5 1 CHAR MACINTOSH 0 @I1@ INDI 1 NAME Bob /Cox/ 1 SEX M 1 FAMS @F1@ 1 CHAN 2 DATE 11 FEB 2006 0 @I2@ INDI 1 NAME Joann /Para/ 1 SEX F 1 FAMS @F1@ 1 CHAN 2 DATE 11 FEB 2006 0 @I3@ INDI 1 NAME Bobby Jo /Cox/ 1 SEX M 1 FAMC @F1@ 1 CHAN 2 DATE 11 FEB 2006 0 @F1@ FAM 1 HUSB @I1@ 1 WIFE @I2@ 1 MARR 1 CHIL @I3@ 0 TRLR


    <root> <HEAD> </HEAD> <SOUR>Reunion <VERS>V8.0 <CORP>Leister Productions </CORP> </VERS> <DEST>Reunion </DEST> <DATE>11 FEB 2006 </DATE> <FILE>test </FILE> <GEDC> </GEDC> <VERS>5.5 </VERS> <CHAR>MACINTOSH </CHAR> </SOUR> <@I1@>INDI </@I1@> <NAME>Bob /Cox/ <SEX>M </SEX> <FAMS>@F1@ </FAMS> <CHAN> </CHAN> <DATE>11 FEB 2006 </DATE> </NAME> <@I2@>INDI </@I2@> <NAME>Joann /Para/ <SEX>F </SEX> <FAMS>@F1@ </FAMS> <CHAN> </CHAN> <DATE>11 FEB 2006 </DATE> </NAME> <@I3@>INDI </@I3@> <NAME>Bobby Jo /Cox/ <SEX>M </SEX> <FAMC>@F1@ </FAMC> <CHAN> </CHAN> <DATE>11 FEB 2006 </DATE> </NAME> <@F1@>FAM </@F1@> <HUSB>@I1@ <WIFE>@I2@ </WIFE> <MARR> </MARR> <CHIL>@I3@ </CHIL> </HUSB> <TRLR> </TRLR> </root>

    Perl reduces RSI - it saves typing