Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

LibXml - Convert XML to CSV from API Feed

by Devon (Acolyte)
on Apr 09, 2013 at 02:24 UTC ( #1027624=perlquestion: print w/ replies, xml ) Need Help??
Devon has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys,

I'm pulling XML data from a REST API feed and I need to convert said data to CSV for entry in a database. The API is for web analytics data which spits out slightly different XML files depending upon the report chosen. I want my conversion script to be able to handle slight variations in the XML schema.

First, here's the current state of my code. It works for reports which have only 2 dimensions. If a report has more dimensions, additional 'DataRow' nodes are added as children. As the Xpaths change, my script breaks, printing only the first 2 dimensions and no data. I'm a using a loop per level, so I'm hoping there's a cleaner way to go about it than having 5+ nested loops.
perl <<'EOF' use warnings; use strict; use XML::LibXML; my($data) = './work/data.xml'; my($file) = './out/data.csv'; open(my $out, '>', $file) or die "Could not open file '$file' $!"; my($parser) = XML::LibXML->new(); my $xmldata = $parser->parse_file($data); my @childarray; #The first 2 loop blocks print column headings #Dimensions ########################### #Fixed Date Heading ########################### push( @childarray, qq("WTDate")); for my $node ( $xmldata->findnodes( '/DimensionalReport/ReportDefiniti +on/list/Dimension' )) { for my $dimension ($node->findnodes('*[2]')) { my $child = $dimension->to_literal; if ($dimension->nodeName eq "string") { $child = qq("$child"); } push( @childarray, $child); } } #Measures for my $node ( $xmldata->findnodes( '/DimensionalReport/ReportDefiniti +on/list/Measure' )) { for my $measure ($node->findnodes('*[1]')) { my $child = $measure->to_literal; if ($measure->nodeName eq "string") { $child = qq("$child"); } push( @childarray, $child); } } print $out join(",", @childarray), "\n"; undef @childarray; #Print Data #Records for my $node ( $xmldata->findnodes( '/DimensionalReport/list/DataRow' +)) { for my $lvldim ($node->findnodes('./list/list/DataRow')) { my $dim1 = $node->getAttribute("name"); if ($node->nodeName eq "string" or $node->nodeName eq "list") +{ $dim1 = qq("$dim1"); } push( @childarray, $dim1); my $child = $lvldim->getAttribute("name"); if ($lvldim->nodeName eq "string" or $lvldim->nodeName eq "lis +t") { $child = qq("$child"); } push( @childarray, $child); for my $lvlmeas ($lvldim->findnodes('./list/*')) { my $child = $lvlmeas->to_literal; if ($lvlmeas->nodeName eq "string" or $lvlmeas->nodeName e +q "list") { $child = qq("$child"); } push( @childarray, $child); } print $out join(",", @childarray), "\n"; undef @childarray; } } close $out; EOF }
Now I will provide two data examples. Here is a simple one:

https://ezcrypt.it/jk6n#4qRm2gc3F7f0RnOMqL5bPaYl

Expected output should be something like this:
"Time Period","Active Visits","Page Views","Clickthroughs","Daily Visi +tors","Weekly Visitors","Monthly Visitors","Quarterly Visitors","Year +ly Visitors","Single Page View Visits","Entry Page Visits","Bounce Ra +te" 1/1/2013,9609.00,36456.00,604.00,9265.00,8948.00,9265.00,9265.00,9265. +00,4444.00,9608.00,46.25 1/2/2013,22088.00,76762.00,10291.00,21460.00,21088.00,21263.00,21263.0 +0,21263.00,12022.00,22054.00,54.51

And, something with a few more dimensions:

https://ezcrypt.it/kk6n#eVINakVb1teCRmxKvdbtirFi

And the goal output:
"WTDate","DMA","Most Recent Campaign Demand Channel","Most Recent Camp +aign Partner","Most Recent Campaign Marketing Program","Most Recent C +ampaign Marketing Activity","Most Recent Campaign Description","Most +Recent Campaign ID","Visits","Page Views","Clickthroughs","Orders","R +evenue","Average Revenue per Order","Units","Average Units per Order" +,"Average Visit Duration (Minutes)","Average Visit Page Views","Hits" +,"Daily Campaign Visitors","Weekly Campaign Visitors","Monthly Campai +gn Visitors","Quarterly Campaign Visitors","Yearly Campaign Visitors" +,"New Campaign Visitors" 2013-01-01,"500 (Portland-Auburn:ME-NH)","online","null","null","null" +,"null","ban_384938",384,4859,38,3,4958,50.00,4,2,2.3,3.1,3844,200,40 +0,500,593,2,203 2013-01-01,"500 (Portland-Auburn:ME-NH)","online","null","null","null" +,"null","ban_384950",390,4859,38,3,4958,50.00,4,2,2.3,3.1,3844,200,40 +0,500,593,2,203 2013-01-01,"501 (New York:CT-NJ-NY-PA)","online","null","null","null", +"null","ban_384950",35,200,38,3,4958,50.00,4,2,2.3,3.1,3844,200,400,5 +00,593,2,203

I apologize for my sloppy code; this is my first perl script. Any feedback to push me in the right direction would be much appreciated.

Thanks

Comment on LibXml - Convert XML to CSV from API Feed
Select or Download Code
Re: LibXml - Convert XML to CSV from API Feed
by Anonymous Monk on Apr 09, 2013 at 03:49 UTC
Re: LibXml - Convert XML to CSV from API Feed
by Anonymous Monk on Apr 10, 2013 at 11:03 UTC
      I haven't had a chance to work on this in a while, but I just wanted to say I appreciate the responses.

      I should have time to flesh it out this week and will post back with the final product.

      Thanks again

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1027624]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (11)
As of 2015-07-07 13:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (88 votes), past polls