http://www.perlmonks.org?node_id=908451

ymchaitu has asked for the wisdom of the Perl Monks concerning the following question:

Hai I have an xml which of this format
<?xml version="1.0" encoding="UTF-8"?> <group> <article-ref rid="doi-ANTI_1995-63-2_001-1"/> <group> <title>America</title> <group> <title>Pakistan</title> <group> <title>India</title> <article-ref rid="doi-ANTI_1995-63-2_001-2"/> </group> </group> </group> <article-ref rid="doi-ANTI_1995-63-2_001-3"/> <group> <title>SYMPOSIUM: POST-CHICAGO ECONOMICS</title> <article-ref rid="doi-ANTI_1995-63-2_001-4"/> <article-ref rid="doi-ANTI_1995-63-2_001-5"/> <article-ref rid="doi-ANTI_1995-63-2_001-6"/> <article-ref rid="doi-ANTI_1995-63-2_001-7"/> <article-ref rid="doi-ANTI_1995-63-2_001-8"/> <article-ref rid="doi-ANTI_1995-63-2_001-9"/> <article-ref rid="doi-ANTI_1995-63-2_001-10"/> <article-ref rid="doi-ANTI_1995-63-2_001-11"/> <article-ref rid="doi-ANTI_1995-63-2_001-12"/> <article-ref rid="doi-ANTI_1995-63-2_001-13"/> </group> <article-ref rid="doi-ANTI_1995-63-2_001-14"/> </group>
This xml need be converted to the following output format
</group> <group> <group> <title><![CDATA[America]]></title><group> <title><![CDATA[Pakistan]]></title><group> <title><![CDATA[India]]></title> <article-ref rid="001-2"/> </group> </group> </group> </group> <group> <article-ref rid="001-3"/> </group> <group> <group> <title><![CDATA[SYMPOSIUM: POST-CHICAGO ECONOMICS]]></title> <article-ref rid="001-4"/> <article-ref rid="001-5"/> <article-ref rid="001-6"/> <article-ref rid="001-7"/> <article-ref rid="001-8"/> <article-ref rid="001-9"/> <article-ref rid="001-10"/> <article-ref rid="001-11"/> <article-ref rid="001-12"/> <article-ref rid="001-13"/> </group> </group> <group> <article-ref rid="001-14"/> </group>
I could not able to get an idea where to start from. so can any one kindly suggest me what to do on this.

Replies are listed 'Best First'.
Re: Group XML
by toolic (Bishop) on Jun 07, 2011 at 12:43 UTC
    Consider using an XML parser, such as XML::Twig. Read the documentation, work through the tutorial, write some code, and if you still have problems, post specific questions here.
      Talk about doing things the hard way ;p its a simple XSLT transform

        Oh, a candidate for a link to Just use an XSLT stylesheet! Would you care to elaborate and show us that "simple transformation"? Thanks.

Re: Group XML
by choroba (Cardinal) on Jun 07, 2011 at 15:24 UTC
    I do not understand exactly what should be wrapped into groups (the beginning of the output is missing, anyway). To insert CDATA and change rid attributes, you can use XML::XSH2 in this way:
    open 908451.xml ; for $t in //title/text() insert cdata $t replace $t ; for //article-ref/@rid insert text xsh:subst(., '.*-2_' , '') replace . ;
    The rest would be also possible if you can explain the algorithm.