Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^2: Convert CSV file to XML file using Perl?

by Anonymous Monk
on Jul 04, 2015 at 19:21 UTC ( [id://1133196]=note: print w/replies, xml ) Need Help??


in reply to Re: Convert CSV file to XML file using Perl?
in thread Convert CSV file to XML file using Perl?

All of the O’Reilly books ... the ones with animals on the cover ... are done that way.

Nope.

  • Comment on Re^2: Convert CSV file to XML file using Perl?

Replies are listed 'Best First'.
Re^3: Convert CSV file to XML file using Perl?
by laknarc (Initiate) on Jul 05, 2015 at 00:55 UTC

    Thanks all. Here is the SQL which generates the XML file. However it takes days to generate the XML file for 20 million records. I do have proper indexes on the tables. Could you please provide your ideas to generate this XML. Which one is the fast method to build this XML?. 1. Extracting the data to CSV file takes about 15 minutes using ETL and generate the XML using Perl. 2. Otherwise writing a SQL by joining all the normalized tables and generate the XML file by calling the SQL within the perl.

    select xmlserialize(document        XMLElement("transactiondetails",                   XMLElement("rltp_id", rltp_id),                   XMLElement("rltp_name", rltp_name),                      (SELECT XMLAGG(XMLElement("product",                                    XMLElement("prod_id", p.prod_id),                                    XMLElement("prod_name", p.prod_name) +,                          (SELECT XMLAGG(XMLElement("customer",                                        XMLElement("cust_id", cust_id),                                        XMLElement("cust_name", cust_nam +e),                                   (SELECT XMLAGG(XMLElement("account",                                              XMLElement("acc_id", acc_i +d),                                              XMLElement("acc_name", acc +_name),                                              XMLElement("acc_balance", +acc_balance),                                             nvl(                                               (                                                select xmlconcat(                                                         xmlelement("ac +c_type", at.acc_type),                                                         xmlagg(xmlelem +ent("acc_code", at.acc_code))                                                       )                                                from acctype at                                                where at.rltp_id = a.rl +tp_id                                                and at.prod_id = a.prod +_id                                                and at.cust_id = a.cust +_id                                                and at.acc_id = a.acc_i +d                                                group by at.acc_type                                                )                                              , xmlconcat(xmlelement("a +cc_type"),xmlelement("acc_code"))                                                ) ,                                         (SELECT XMLAGG(XMLElement("tra +nsaction",                                                 XMLElement("txntrack",                                                 XMLElement("txn_id", t. +txn_id),                                                 XMLElement("txn_amt", t +.txn_amt),                                                (                                                 select nvl(                                                          xmlagg(xmlele +ment("txn_code",tt.txn_code))                                                        , xmlelement("t +xn_code")                                                        )                                                 from txntype tt                                                 where tt.rltp_id = t.r +ltp_id                                                 and tt.prod_id = t.pro +d_id                                                 and tt.cust_id = t.cus +t_id                                                 and tt.acc_id = t.acc_ +id                                                 and tt.txn_id = t.txn_ +id                                                )                                                ))                                                ORDER BY t.txn_id                                                )                                          FROM transaction t                                          WHERE t.rltp_id  =a.rltp_id                                          AND t.prod_id   =a.prod_id                                          AND t.cust_id   =a.cust_id                                          AND t.acc_id=a.acc_id))                                         ORDER BY a.acc_id                                         )                              FROM account a                              WHERE c.rltp_id=a.rltp_id                              AND c.prod_id=a.prod_id                              AND c.cust_id=a.cust_id))                             ORDER BY c.cust_id                             )                          FROM cust c                          WHERE p.rltp_id = c.rltp_id                          AND p.prod_id = c.prod_id))                         ORDER BY p.prod_id                         )                       FROM product p                       WHERE p.rltp_id = r.rltp_id )                             ) indent) AS xml  FROM    rltp_mngr r    WHERE rltp_id='10';

      Assuming that your dump to CSV is ordered like you showed in the original post, I'd probably go with dump to CSV and then write some perl to read the CSV line by line and build up the XML using XML::LibXML. Perl is pretty quick at that sort of stuff. I have some code that's all perl calling MySQL and for stuff that's a few thousand records it generates a bunch of HTML files (not too different from building up XML) plus separate text audit files by the time I've lifted my finger off the return key. For stuff that's reading a few thousand lines of CSV and doing inserts into a database of millions of lines (including some lookups to decide how to cross reference things) it takes a few minutes.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1133196]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-24 04:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found