Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

XML File Creation in Perl

by documents9900 (Initiate)
on Apr 16, 2013 at 13:02 UTC ( [id://1028909]=perlquestion: print w/replies, xml ) Need Help??

documents9900 has asked for the wisdom of the Perl Monks concerning the following question:

My First input file is
Root1 TBLA KEY1 COLA A B Root1 TBLA KEY1 COLB D E Root1 TBLA KEY3 COLX M N Root2 TBLB KEY4 COLX M N Root2 TBLB KEY4 COLD A B Root3 TBLC KEY5 COLD A B
My second input file is
Root1 TBLA KEY6 Root2 TBLB KEY7 Root3 TBLC KEY8
My third input file is
Root1 TBLA KEY9 Root1 TBLA KEY10 Root3 TBLC KEY11

Basically File representation is : 1) First file represents the old and new values. First is root table, Second is actual table in which diff is there. Third column tells the key value. Fourth and Fifth represents old and new value.

2) Second file represents the primary key which exists in db1 only and not in db2. First is root table, Second is actual table in which key exists. Third column tells the key value

3) Third file represents the primary key which exists in db2 only and not in db1. First is root table, Second is actual table in which key exists. Third column tells the key value

The output to be created in xml format as
<Data> <Root1> <TBLA> <NEW1> <KEY>KEY6</KEY> <NEW1> <NEW2> <KEY>KEY9</KEY> <KEY>KEY10</KEY> <NEW2> <MODIFIED> <KEY name =KEY1> <COLA> <oldvalue>A</oldvalue> <newvalue>B</newvalue> </COLA> <COLB> <oldvalue>D</oldvalue> <newvalue>E</newvalue> </COLB> </KEY> <KEY name =KEY3> <COLX> <oldvalue>M</oldvalue> <newvalue>N</newvalue> </COLX> </KEY> </MODIFIED> </TBLA> </Root1> <Data>
THIS IS NOT COMPLETE OUTPUT. PART OF OUTPUT IS DISPLAYED Can anyone suggest what would be the best way to do this. Should i convert this text file to hash of hashes first and then try using pltoxml(). does this make sense. Can XML::Simple or XML::Writer suffice this.

This is the first time I am working on xml and not sure which approach will help efficicently my solution. A small example wrt to my req would be appreciated.

*Input file will always be sorted on Root and then TBLNAME

Output format : Output contains for every root, every table in that root and that for every table, key which exists in one and then key which exists in second only. This comes in section new1 and new2 respectively. Third section contains Modified which needs to read from first input file and list the key value and with that key value what columns are modified (their old and new value)

If I have to use XML::Simple, how do i create hashref from these files which i can pass it to XMLout. There is no key in any of these files.

I wrote a small piece of code which reads eah root from file1 and from that root reads all files and manually create the tags. Basically i tried creating my own XML parser.

Replies are listed 'Best First'.
Re: XML File Creation in Perl
by kcott (Archbishop) on Apr 16, 2013 at 15:48 UTC

    G'day documents9900,

    This is fairly straightforward in XML::Simple; however, be aware that for more complex work, this is often not the best choice.

    #!/usr/bin/env perl use strict; use warnings; use Inline::Files; use XML::Simple qw{:strict}; my %xml_hash = (Data => {}); my $xml_data = $xml_hash{Data}; my (%db1, %db2); while (<DB1>) { my ($root, $table, $key) = split; push @{$db1{$root}{$table}}, $key; } while (<DB2>) { my ($root, $table, $key) = split; push @{$db2{$root}{$table}}, $key; } while (<DIFF>) { my ($root, $table, $key, $col, $old, $new) = split; $xml_data->{$root}{$table}{NEW1}{KEY} = $db1{$root}{$table} if exists $db1{$root}{$table}; $xml_data->{$root}{$table}{NEW2}{KEY} = $db2{$root}{$table} if exists $db2{$root}{$table}; $xml_data->{$root}{$table}{MODIFIED}{KEY}{$key}{$col}{oldvalue} = +[$old]; $xml_data->{$root}{$table}{MODIFIED}{KEY}{$key}{$col}{newvalue} = +[$new]; } print XMLout(\%xml_hash, KeepRoot => 1, KeyAttr => {KEY => 'name'}); __DIFF__ Root1 TBLA KEY1 COLA A B Root1 TBLA KEY1 COLB D E Root1 TBLA KEY3 COLX M N Root2 TBLB KEY4 COLX M N Root2 TBLB KEY4 COLD A B Root3 TBLC KEY5 COLD A B __DB1__ Root1 TBLA KEY6 Root2 TBLB KEY7 Root3 TBLC KEY8 __DB2__ Root1 TBLA KEY9 Root1 TBLA KEY10 Root3 TBLC KEY11

    Output:

    $ pm_xml_db_diff.pl <Data> <Root1> <TBLA> <MODIFIED> <KEY name="KEY1"> <COLA> <newvalue>B</newvalue> <oldvalue>A</oldvalue> </COLA> <COLB> <newvalue>E</newvalue> <oldvalue>D</oldvalue> </COLB> </KEY> <KEY name="KEY3"> <COLX> <newvalue>N</newvalue> <oldvalue>M</oldvalue> </COLX> </KEY> </MODIFIED> <NEW1> <KEY>KEY6</KEY> </NEW1> <NEW2> <KEY>KEY9</KEY> <KEY>KEY10</KEY> </NEW2> </TBLA> </Root1> <Root2> <TBLB> <MODIFIED> <KEY name="KEY4"> <COLD> <newvalue>B</newvalue> <oldvalue>A</oldvalue> </COLD> <COLX> <newvalue>N</newvalue> <oldvalue>M</oldvalue> </COLX> </KEY> </MODIFIED> <NEW1> <KEY>KEY7</KEY> </NEW1> </TBLB> </Root2> <Root3> <TBLC> <MODIFIED> <KEY name="KEY5"> <COLD> <newvalue>B</newvalue> <oldvalue>A</oldvalue> </COLD> </KEY> </MODIFIED> <NEW1> <KEY>KEY8</KEY> </NEW1> <NEW2> <KEY>KEY11</KEY> </NEW2> </TBLC> </Root3> </Data>

    -- Ken

      Thanks Ken for your help. This is working fine. However while testing i found that while populating $xml_data for NEW1 and NEW2, if clause is there which checks that if the combination of $root,$table exists in db1 (similarly db2) There are cases in which the combination (root,table) exists only in DB1 or DB2 or DIFF file. So in those cases the data is not coming as expected. I am thinking of populating data directly into xml_data from each DB1/DB2/DIFF loop. What do you suggest on this
        "I am thinking of populating data directly into xml_data from each DB1/DB2/DIFF loop."

        I suspect that's probably the best option. You'll get some duplicate assignments but I don't imagine that will be a problem. Getting around that would likely involve setting and testing flags that indicate whether a particular assignment has occurred. Benchmark if you feel it's important.

        I see you've added some realistic data. Below, you'll see I've added some more to simulate these new scenarios you've described ("... only in DB1 or DB2 ..."). Where you have data like "A B C", I've plugged the spaces with underscores (i.e. "A_B_C"); this was just so I didn't have to rework how I was handling the data with Inline::Files and split: I'm assuming you're already getting your resultsets as array data. One thing I noticed was that I'm getting <KEY>A_B_C</KEY> while your output is showing <KEY>'A B C'</KEY>. The start and end tags act as delimiters so additional quotes aren't generally necessary in XML; they could potentially cause issues like the data being converted to &apos;A B C&apos;: my instinct would be to remove the quotes before populating the XML — you may have a valid reason for leaving them in.

        Here's the updated code (includes handling the <null> field):

        #!/usr/bin/env perl use strict; use warnings; use Inline::Files; use XML::Simple qw{:strict}; my %xml_hash = (Data => {}); my $xml_data = $xml_hash{Data}; my (%db1, %db2); while (<DB1>) { my ($root, $table, $key) = split; push @{$db1{$root}{$table}}, $key; $xml_data->{$root}{$table}{NEW1}{KEY} = [$key]; } while (<DB2>) { my ($root, $table, $key) = split; push @{$db2{$root}{$table}}, $key; $xml_data->{$root}{$table}{NEW2}{KEY} = [$key]; } while (<DIFF>) { my ($root, $table, $key, $col, $old, $new) = map { $_ eq '<null>' ? '' : $_ } split; $xml_data->{$root}{$table}{NEW1}{KEY} = $db1{$root}{$table} if exists $db1{$root}{$table}; $xml_data->{$root}{$table}{NEW2}{KEY} = $db2{$root}{$table} if exists $db2{$root}{$table}; $xml_data->{$root}{$table}{MODIFIED}{KEY}{$key}{$col}{oldvalue} = +[$old]; $xml_data->{$root}{$table}{MODIFIED}{KEY}{$key}{$col}{newvalue} = +[$new]; } print XMLout(\%xml_hash, KeepRoot => 1, KeyAttr => {KEY => 'name'}); __DIFF__ EMPLOYEE EMPLOYEE XYZ TITLE <null> Mr EMPLOYEE EMPDETAILS DEF CITY California New York CUSTOMER CUSTOMER ABC CAPTION Regular Premium __DB1__ EMPLOYEE EMPLOYEE NEW_EMPLOYEE_1 EMPLOYEE EMPLOYEE NEW_EMPLOYEE_9 EMPLOYEE EMPDETAILS NEW_EMPLOYEE1-DETAILS EMPLOYEE EMPDETAILS NEW_EMPLOYEE9-DETAILS EMPLOYEE EMPDETAILS NEW_EMPLOYEE16-DETAILS IN_DB1_ONLY IN_DB1_ONLY IN_DB1_ONLY IN_DB1+DB2 IN_DB1+DB2 IN_DB1+DB2 __DB2__ EMPLOYEE EMPLOYEE NEW_EMPLOYEE_6 EMPLOYEE EMPDETAILS NEW_EMPLOYEE6-DETAILS CUSTOMER CUSTOMER NEW_CUSTOMER IN_DB2_ONLY IN_DB2_ONLY IN_DB2_ONLY IN_DB1+DB2 IN_DB1+DB2 IN_DB1+DB2

        Output:

        $ pm_xml_db_diff2.pl <Data> <CUSTOMER> <CUSTOMER> <MODIFIED> <KEY name="ABC"> <CAPTION> <newvalue>Premium</newvalue> <oldvalue>Regular</oldvalue> </CAPTION> </KEY> </MODIFIED> <NEW2> <KEY>NEW_CUSTOMER</KEY> </NEW2> </CUSTOMER> </CUSTOMER> <EMPLOYEE> <EMPDETAILS> <MODIFIED> <KEY name="DEF"> <CITY> <newvalue>New</newvalue> <oldvalue>California</oldvalue> </CITY> </KEY> </MODIFIED> <NEW1> <KEY>NEW_EMPLOYEE1-DETAILS</KEY> <KEY>NEW_EMPLOYEE9-DETAILS</KEY> <KEY>NEW_EMPLOYEE16-DETAILS</KEY> </NEW1> <NEW2> <KEY>NEW_EMPLOYEE6-DETAILS</KEY> </NEW2> </EMPDETAILS> <EMPLOYEE> <MODIFIED> <KEY name="XYZ"> <TITLE> <newvalue>Mr</newvalue> <oldvalue></oldvalue> </TITLE> </KEY> </MODIFIED> <NEW1> <KEY>NEW_EMPLOYEE_1</KEY> <KEY>NEW_EMPLOYEE_9</KEY> </NEW1> <NEW2> <KEY>NEW_EMPLOYEE_6</KEY> </NEW2> </EMPLOYEE> </EMPLOYEE> <IN_DB1+DB2> <IN_DB1+DB2> <NEW1> <KEY>IN_DB1+DB2</KEY> </NEW1> <NEW2> <KEY>IN_DB1+DB2</KEY> </NEW2> </IN_DB1+DB2> </IN_DB1+DB2> <IN_DB1_ONLY> <IN_DB1_ONLY> <NEW1> <KEY>IN_DB1_ONLY</KEY> </NEW1> </IN_DB1_ONLY> </IN_DB1_ONLY> <IN_DB2_ONLY> <IN_DB2_ONLY> <NEW2> <KEY>IN_DB2_ONLY</KEY> </NEW2> </IN_DB2_ONLY> </IN_DB2_ONLY> </Data>

        -- Ken

Re: XML File Creation in Perl
by hdb (Monsignor) on Apr 16, 2013 at 14:16 UTC

    My proposal was: Read your desired XML with XML::Simple, use Data::Dumper to see what the resulting structure is, then create it from all your data, and use XMLout to write your xml file. Here is the script to do that.

    But then I thought: Will this work? All this effort. Let's test first and look at what XMLout does to what XMLin has created. And see: the output is different.

    use strict; use warnings; use Data::Dumper; use XML::Simple; my $xml = <<XML; <Data> <Root1> <TBLA> <NEW1> <KEY>KEY6</KEY> </NEW1> <NEW2> <KEY>KEY9</KEY> <KEY>KEY10</KEY> </NEW2> <MODIFIED> <KEY name ="KEY1"> <COLA> <oldvalue>A</oldvalue> <newvalue>B</newvalue> </COLA> <COLB> <oldvalue>D</oldvalue> <newvalue>E</newvalue> </COLB> </KEY> <KEY name ="KEY3"> <COLX> <oldvalue>M</oldvalue> <newvalue>N</newvalue> </COLX> </KEY> </MODIFIED> </TBLA> </Root1> </Data> XML my $ref = XMLin( $xml ); print Dumper($ref); print XMLout( $ref, RootName=>"Data" );

    which creates

    <Data> <Root1 name="TBLA"> <MODIFIED name="KEY"> <KEY1 name="COLA" newvalue="B" oldvalue="A" /> <KEY1 name="COLB" newvalue="E" oldvalue="D" /> <KEY3 name="COLX" newvalue="N" oldvalue="M" /> </MODIFIED> <NEW1 KEY="KEY6" /> <NEW2> <KEY>KEY9</KEY> <KEY>KEY10</KEY> </NEW2> </Root1> </Data>

    The question now is: Is that ok for you? Or do you need more control, in which case there are a number of options in XML::Simple you can play with or you have to take another route.

    PS: There were a number of inconsistencies in the sample xml you have provided.

      Thanks for your help so far. Let me give you some actual data so that it will be easy to interpret.
      ROOT OBJECT KEY COLUMN OLD NEW EMPLOYEE EMPLOYEE XYZ TITLE <null> Mr EMPLOYEE EMPDETAILS DEF CITY California New York CUSTOMER CUSTOMER ABC CAPTION Regular Premium
      File 2
      EMPLOYEE EMPLOYEE NEW EMPLOYEE 1 EMPLOYEE EMPLOYEE NEW EMPLOYEE 9 EMPLOYEE EMPDETAILS NEW EMPLOYEE1-DETAILS EMPLOYEE EMPDETAILS NEW EMPLOYEE9-DETAILS EMPLOYEE EMPDETAILS NEW EMPLOYEE16-DETAILS
      File 3
      EMPLOYEE EMPLOYEE NEW EMPLOYEE 6 EMPLOYEE EMPDETAILS NEW EMPLOYEE6-DETAILS CUSTOMER CUSTOMER NEW CUSTOMER
      So from these three files i wrote small perl program to generate the files in xml
      <Data> <EMPLOYEE> <KEY name = 'XYZ'> <TITLE> <oldvalue></oldvalue> <newvalue>Mr</newvalue> </TITLE> </KEY> </EMPLOYEE> <EMPDETAILS> <KEY name = 'DEF'> <CITY> <oldvalue>California</oldvalue> <newvalue>New York</newvalue> </CITY> </KEY> </EMPDETAILS> <CUSTOMER> <KEY name = 'ABC'> <CAPTION> <oldvalue>Regular</oldvalue> <newvalue>Premium</newvalue> </CAPTION> </KEY> </CUSTOMER> </Data>
      Output 2
      <Data> <EMPLOYEE> <KEY>'NEW EMPLOYEE 1'</KEY> <KEY>'NEW EMPLOYEE 9'</KEY> </EMPLOYEE> <EMPDETAILS> <KEY>'NEW EMPLOYEE1-DETAILS'</KEY> <KEY>'NEW EMPLOYEE9-DETAILS'</KEY> <KEY>'NEW EMPLOYEE16-DETAILS'</KEY> </EMPDETAILS> </Data>
      Output 3
      <Data> <EMPLOYEE> <KEY>'NEW EMPLOYEE 6'</KEY> </EMPLOYEE> <EMPDETAILS> <KEY>'NEW EMPLOYEE6-DETAILS'</KEY> </EMPDETAILS> <CUSTOMER> <KEY>'NEW CUSTOMER'</KEY> </CUSTOMER> </Data>
      Now i need to combine all these three into single output file. First field tells the root object. So for EMPLOYEE, i need to display details of EMPLOYEE and EMPDETAILS table what has been in one db, other db, and if exists in both db what are the old and new values (these all are created from perl program and it contains only delta information). Similary for CUSTOMER root object, i need to display all entities of this object and then what exists in one db,other db, modified information. There are many occurances of modified columns i.e. for each key several columns can be modified

      There are no headers in any file. I have added one in first input for clarification

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1028909]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-19 02:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found