Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Convert CSV file to XML file using Perl?

by laknarc (Initiate)
on Jul 04, 2015 at 01:10 UTC ( [id://1133118]=perlquestion: print w/replies, xml ) Need Help??

laknarc has asked for the wisdom of the Perl Monks concerning the following question:

I have already posted this question as Anonymous monk. Please delete my previous thread. I have registered and created this question again. we have a CSV file which need be to converted to XML using Perl or Unix shell scripting. I was able to build this XML using SQL and Korn shell script. However, SQL/shell script is running for long time. Hence, I'm considering to write a Perl to generate this XML file. Basically need to build this XML hierarchically by avoiding the duplicate values. Please take a look into the CSV and XML file below. But the both SQL and Korn shell script takes 15 minutes to create the XML for just 6000 records. Please let me know if there is any other efficient way to process this CSV file. this CSV file contains total 20 million records. Need to build this XML file within 60 to 120 minutes. Appreciate your help. Input data below:-

RLTP_ID,RLTP_NAME,PROD_ID,PROD_NAME,CUST_ID,CUST_NAME,ACC_ID,ACC_NAME, +ACC_BALANCE,TXN_ID,TXN_AMT,ACC_TYPE,ACC_CODE,TXN_CODE 10,Phil,1,Personal,2,Fixed,3,Savings,3000,4,500,X,YY,11 10,Phil,1,Personal,2,Fixed,3,Savings,3000,4,500,X,YY,12 10,Phil,1,Personal,2,Fixed,3,Savings,3000,4,500,X,ZZ,11 10,Phil,1,Personal,2,Fixed,3,Savings,3000,4,500,X,ZZ,12 10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,AA,11 10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,AA,12 10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,AA,13 10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,BB,11 10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,BB,12 10,Phil,1,Personal,2,Fixed,7,Savings,3000,8,500,X,BB,13

Output XML

<transactiondetails> <rltp_id>10</rltp_id> <rltp_name>Phil</rltp_name> <product> <prod_id>1</prod_id> <prod_name>Personal</prod_name> <customer> <cust_id>2</cust_id> <cust_name>Fixed</cust_name> <account> <acc_id>3</acc_id> <acc_name>Savings</acc_name> <acc_balance>3000</acc_balance> <acc_type>X</acc_type> <acc_code>ZZ</acc_code> <acc_code>YY</acc_code> <transaction> <txntrack> <txn_id>4</txn_id> <txn_amt>500</txn_amt> <txn_code>11</txn_code> <txn_code>12</txn_code> </txntrack> </transaction> <transaction> <txntrack> <txn_id>5</txn_id> <txn_amt>500</txn_amt> <txn_code/> </txntrack> </transaction> <transaction> <txntrack> <txn_id>6</txn_id> <txn_amt>500</txn_amt> <txn_code/> </txntrack> </transaction> </account> <account> <acc_id>7</acc_id> <acc_name>Savings</acc_name> <acc_balance>3000</acc_balance> <acc_type>X</acc_type> <acc_code>AA</acc_code> <acc_code>BB</acc_code> <transaction> <txntrack> <txn_id>8</txn_id> <txn_amt>500</txn_amt> <txn_code>11</txn_code> <txn_code>12</txn_code> <txn_code>13</txn_code> </txntrack> </transaction> </account> </customer> </product> </transactiondetails>

Replies are listed 'Best First'.
Re: Convert CSV file to XML file using Perl?
by bitingduck (Chaplain) on Jul 04, 2015 at 02:50 UTC

    I'm too lazy right now to sort through your XML to figure out exactly how you want to do the structure-- rather than just turning each line into an XML record it looks like you want a bunch of nested loops following the columns almost from left to right, and terminating each loop when there's a change in the value. That should be pretty straightforward.

    You should be able to use Text::CSV and XML::LibXML to do something that runs pretty fast. You can probably get by with focusing on the XML::LibXML::Node and XML::LibXML::Element docs.

    And before you get too far- have you profiled your code at all to see where the hangups are? Is it the SQL being super slow because you don't have it indexed on the right columns and while the perl might turn the CSV into XML in a second or two, you're going to spend 3 days exporting the SQL to CSV in the sorted format you show here?

Re: Convert CSV file to XML file using Perl?
by marinersk (Priest) on Jul 04, 2015 at 02:29 UTC

    So, what have you tried? Showing code, with a good description of what you're trying to do (nicely done, above) and what you're getting that's different gets pretty fast answers around here, most of the time.

    Or are you at a point where you haven't really used Perl yet for much of anything, need a nudge in the right direction, and don't really have any code to show?

Re: Convert CSV file to XML file using Perl?
by Anonymous Monk on Jul 04, 2015 at 08:33 UTC
Re: Convert CSV file to XML file using Perl?
by poj (Abbot) on Jul 05, 2015 at 11:37 UTC
    Do the 20 million record all have the same rltp_id ?
    The csv seems to be missing txn_ids 5 and 6 ?
    poj

      Yes. Txn I'd's missing in the file. Because it does not have txn codes. Sorry I forget to mention these. This is the reason we have empty tags in XML. Txn codes are coming from different table. While exporting the data from database to csv I have used inner join. Please consider we have 4 and 5 txn I'd,s in the CSV file and txn codes as nulls. I mean I will use the outer join to export the data to csv.

        Ok, I think I understand you.

        I wrote some code to process the CSV and the results were encouraging, 3 million records in about 5 minutes. However, I think a DBI connection to the database might be cleaner so this is what I have so far. You need to take A_ off the table names in the sql

        Let me know if I am on the right track and I will finish it off.

        poj
Re: Convert CSV file to XML file using Perl?
by sundialsvc4 (Abbot) on Jul 04, 2015 at 12:53 UTC

    Also, consider all of your possibilities ... including, “writing no(!) computer program at all.”

    A CSV to XML Converter in XSLT 2.0.

    It can go the other way, too:   XML to CSV Using XSLT.

    In both cases, as you can see, the transformation is being accomplished using a declarative feature of industry-standard XML libraries such as libxslt, which of course are fully supported by Perl libraries such as XML::LibXML, but also by command-line tools such as Saxon.   You specify what transformations you want, using an XML-based format to do so.   It is the responsibility of the XSLT processor (and there are several of these) to obey those instructions.

    There are widespread applications for this idea.   For example, many documentation files are written in XML-based “semantic markup” languages such as DocBook, and then translated ... using XSLT ... into several different output formats.   (All of the O’Reilly books ... the ones with animals on the cover ... are done that way.)   No custom programming is involved.

    I’ll leave it to others ... and to you ... to decide what is the overall best way to accomplish this goal for your particular project.   With Perl, you have many, many choices.

      All of the O’Reilly books ... the ones with animals on the cover ... are done that way.

      Nope.

        Thanks all. Here is the SQL which generates the XML file. However it takes days to generate the XML file for 20 million records. I do have proper indexes on the tables. Could you please provide your ideas to generate this XML. Which one is the fast method to build this XML?. 1. Extracting the data to CSV file takes about 15 minutes using ETL and generate the XML using Perl. 2. Otherwise writing a SQL by joining all the normalized tables and generate the XML file by calling the SQL within the perl.

        select xmlserialize(document        XMLElement("transactiondetails",                   XMLElement("rltp_id", rltp_id),                   XMLElement("rltp_name", rltp_name),                      (SELECT XMLAGG(XMLElement("product",                                    XMLElement("prod_id", p.prod_id),                                    XMLElement("prod_name", p.prod_name) +,                          (SELECT XMLAGG(XMLElement("customer",                                        XMLElement("cust_id", cust_id),                                        XMLElement("cust_name", cust_nam +e),                                   (SELECT XMLAGG(XMLElement("account",                                              XMLElement("acc_id", acc_i +d),                                              XMLElement("acc_name", acc +_name),                                              XMLElement("acc_balance", +acc_balance),                                             nvl(                                               (                                                select xmlconcat(                                                         xmlelement("ac +c_type", at.acc_type),                                                         xmlagg(xmlelem +ent("acc_code", at.acc_code))                                                       )                                                from acctype at                                                where at.rltp_id = a.rl +tp_id                                                and at.prod_id = a.prod +_id                                                and at.cust_id = a.cust +_id                                                and at.acc_id = a.acc_i +d                                                group by at.acc_type                                                )                                              , xmlconcat(xmlelement("a +cc_type"),xmlelement("acc_code"))                                                ) ,                                         (SELECT XMLAGG(XMLElement("tra +nsaction",                                                 XMLElement("txntrack",                                                 XMLElement("txn_id", t. +txn_id),                                                 XMLElement("txn_amt", t +.txn_amt),                                                (                                                 select nvl(                                                          xmlagg(xmlele +ment("txn_code",tt.txn_code))                                                        , xmlelement("t +xn_code")                                                        )                                                 from txntype tt                                                 where tt.rltp_id = t.r +ltp_id                                                 and tt.prod_id = t.pro +d_id                                                 and tt.cust_id = t.cus +t_id                                                 and tt.acc_id = t.acc_ +id                                                 and tt.txn_id = t.txn_ +id                                                )                                                ))                                                ORDER BY t.txn_id                                                )                                          FROM transaction t                                          WHERE t.rltp_id  =a.rltp_id                                          AND t.prod_id   =a.prod_id                                          AND t.cust_id   =a.cust_id                                          AND t.acc_id=a.acc_id))                                         ORDER BY a.acc_id                                         )                              FROM account a                              WHERE c.rltp_id=a.rltp_id                              AND c.prod_id=a.prod_id                              AND c.cust_id=a.cust_id))                             ORDER BY c.cust_id                             )                          FROM cust c                          WHERE p.rltp_id = c.rltp_id                          AND p.prod_id = c.prod_id))                         ORDER BY p.prod_id                         )                       FROM product p                       WHERE p.rltp_id = r.rltp_id )                             ) indent) AS xml  FROM    rltp_mngr r    WHERE rltp_id='10';

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1133118]
Approved by marinersk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (7)
As of 2024-04-18 08:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found