Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Using XML::XSLT to convert XML to CSV

by Photius (Sexton)
on Aug 21, 2007 at 18:52 UTC ( #634160=perlquestion: print w/ replies, xml ) Need Help??
Photius has asked for the wisdom of the Perl Monks concerning the following question:

I have a modest amount of Perl experience but I have never used XSLT. My forehead is getting pretty sore from frequent encounters with the keyboard. I need to convert an XML file into CSV for input to another system. Various articles on the internet made XML::XSLT sound like just the ticket, but I haven't been able to make it work.

My XML file looks like this (only larger):

<?xml version="1.0" encoding="UTF-8"?> <WEATHER_DATA> <HOUR_DATA> <LOCATION_NAME>KCNU</LOCATION_NAME> <CLC>20</CLC> <DPT>73</DPT> <HUM>82</HUM> <HIX>84</HIX> <DBT>79</DBT> <WCH>79</WCH> <WDR>210</WDR> <WSP>17</WSP> <EFFECTIVE_DATE>08/20/2007</EFFECTIVE_DATE> <HOUR>14:00</HOUR> <TIME_GENERATED>08/20/2007 09:45:00</TIME_GENERATED> </HOUR_DATA> <HOUR_DATA> <LOCATION_NAME>KCOU</LOCATION_NAME> <CLC>100</CLC> <DPT>71</DPT> <HUM>87</HUM> <HIX>79</HIX> <DBT>75</DBT> <WCH>75</WCH> <WDR>190</WDR> <WSP>13</WSP> <EFFECTIVE_DATE>08/20/2007</EFFECTIVE_DATE> <HOUR>14:00</HOUR> <TIME_GENERATED>08/20/2007 09:45:00</TIME_GENERATED> </HOUR_DATA> </WEATHER_DATA>

I need to transform it to CSV like this:

# relation, date, time, CloudCoverPct, DewPointF, Humidity, TempF, Win +dDirection, WindMPH, HeatIndexF, WindChillF KCNU.CHANUTE.KS,08/20/2007,14:00,20,73,82,79,210,17,84,79 KCOU.COLUMBIA.MO,08/20/2007,14:00,100,71,87,75,190,13,79,75

Note that the .csv columns are not in the same order as the tags in the .xml file and the site (a.k.a. relation) name will need to be expanded later.
I realize I will have to enhance the .xls file and add more logic in the script, but at this point, I have not even been able to get XML::XSLT->new() to work.

The skeletal beginning of my .xsl file (realizing it doesn't do everything yet) is:

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Tr +ansform"> <xsl:output method="text"/> <xsl:template match="WEATHER_DATA"> <xsl:apply-templates select="HOUR_DATA"/> </xsl:template <xsl:template match="HOUR_DATA"> <xsl:for-each select="*"> <xsl:value-of select="."/> <xsl:if test="position() != last()"> <xsl:value-of select="','"/> </xsl:if> </xsl:for-each> <xsl:text>&#10;</xsl:text> </xsl:template> </xsl:stylesheet>

The skeletal beginning of my .pl script (realizing it doesn't do everything yet) is:

use strict; use XML::XSLT; my $xmlFilename = 'C:/Perldev/xslt/WEATHER_ACTUAL_20070820144516.XML'; my $xslFilename = 'C:/Perldev/xslt/StyleSheet2.xsl'; (my $outFilename = $xmlFilename) =~ s/\.xml$/\.csv/i; my $xslt = XML::XSLT->new($xslFilename, warnings => 1); my $result = $xslt->transform(XMLFile => $xmlFilename, XSLFile => $xslFilename, OutFile => $outFilename );

The error it returns is:
Error while parsing: no element found at line 1, column 0, byte -1 at C:/Perl/lib/XML/Parse +r.pm line 187

What am I doing wrong?

Comment on Using XML::XSLT to convert XML to CSV
Select or Download Code
Re: Using XML::XSLT to convert XML to CSV
by Photius (Sexton) on Aug 21, 2007 at 19:04 UTC
    Sorry, the full error is:
    Error while parsing: no element found at line 1, column 0, byte -1 at C:/Perl/lib/XML/Parse +r.pm line 187 C:/Perldev/xslt/WEATHER_ACTUAL.xsl at C:/Perl/site/lib/XML/XSLT.pm lin +e 1507.
Re: Using XML::XSLT to convert XML to CSV
by jZed (Prior) on Aug 21, 2007 at 19:25 UTC
    Another possiblility is to use DBD::AnyData (with DBI interface) or AnyData (tied-hash interface) which support reading and writing both XML and CSV.
Re: Using XML::XSLT to convert XML to CSV
by Joost (Canon) on Aug 21, 2007 at 19:46 UTC
    I'm using XML::Parser 2.34 and XML::XSLT 0.48 (that's the current version). And my error message is a little more specific (note that I changed the names of the input files):

    perl test.pl Error while parsing: not well-formed (invalid token) at line 9, column 0, byte 244 at /usr/ +local/lib/perl5/site_perl/5.8.5/i686-linux-thread-multi/XML/Parser.pm + line 187 input.xsl at /usr/local/lib/perl5/site_perl/5.8.8/XML/XSLT.pm line 150 +7.
    the line just before line 9 in the xsl file is:
    </xsl:template
    If I fix that tag I get
    Argument syntax of call to XML::XSLT::transform deprecated. See the d +ocumentation for XML::XSLT::transform at test.pl line 9 Error while parsing: syntax error at line 1, column 0, byte 0 at /usr/local/lib/perl5/site_ +perl/5.8.5/i686-linux-thread-multi/XML/Parser.pm line 187 XMLFile at /usr/local/lib/perl5/site_perl/5.8.8/XML/XSLT.pm line 1507.
    That lead me to fix the code like this:
    use strict; use XML::XSLT; my $xmlFilename = 'input.xml'; my $xslFilename = 'input.xsl'; (my $outFilename = $xmlFilename) =~ s/\.xml$/\.csv/i; my $xslt = XML::XSLT->new($xslFilename, warnings => 1); $xslt->transform($xmlFilename); print $xslt->toString;
    Which comes kind of close to the desired output:

    KCNU2073828479792101708/20/200714:0008/20/2007 09:45:00 KCOU10071877975751901308/20/200714:0008/20/2007 09:45:00
    Personally, I would use XML::Twig and Text::xSV / Text::CSV for this kind of thing.

    update: in any case you should probably make sure you've got the latest stable versions of both XML::XSTL and XML::Parser.

      Joost, thanks for spotting the missing > and for your advice.

      I was still getting the same error after fixing that, though.
      I found the cause. It works fine as long as the input.xml filename and input.xsl filename do *not* have a path in front of them. I am on a Windows platform and have tried the formats:
      'C:/mypath/input.xml'
      'C:\mypath\input.xml'
      "C:/mypath/input.xml"
      "C:\\mypath\\input.xml"

      It seems it can only accept files in the current directory??
        I'm not sure what the problem could be (and I don't have a windows perl available at the moment) but this works for me on linux:
        use strict; use XML::XSLT; my $xmlFilename = '/home/joost/input.xml'; my $xslFilename = '/home/joost/input.xsl'; (my $outFilename = $xmlFilename) =~ s/\.xml$/\.csv/i; my $xslt = XML::XSLT->new($xslFilename, warnings => 1); $xslt->transform($xmlFilename); print $xslt->toString;
        Can't help you much further than that, though.

        update: did you update the XML modules, or where they already at the latest version?

        You can check by doing

        perl -MXML::XSLT -e'print "$XML::XSLT::VERSION\n"'
        and similar for XML::Parser
        On the one hand, the documentation says you should use "base" as an argument to the transform call.

        On the other hand, I tried that and I still get that error.



        Nobody says perl looks like line-noise any more
        kids today don't know what line-noise IS ...

        hey,

        i had the same problem using parser.pm on windows xp machine for absolute paths.

        solved by adding "file:///" on the beginning of the path

        ex: XML::XSLT->new ("file:///C:/mypath/to/something.xsl" );

Re: Using XML::XSLT to convert XML to CSV
by Cody Pendant (Prior) on Aug 22, 2007 at 00:58 UTC
    In the nicest possible way, it seems you're getting a bit X-Y in that you're trying to fix the wrapper code around the actual code. If all you need to do is transform the file, then you don't actually need perl or a perl module at all.

    If I were you, I'd, download the MSXML application and get transforming on the command line.

    msxsl C:/filename.xml C:/filename.xsl -o C:/output.csv
    is all you're really trying to do.

    And if you really want to do it from perl, you can always do system or backticks or whatever around that, creating your own wrapper.

    It sucks, but hell, you're stuck transforming XML on Windows (as am I, can you tell?) you might as well bite the bullet.

    Let me know if I can help.



    Nobody says perl looks like line-noise any more
    kids today don't know what line-noise IS ...
      when try doing what you mentioned i see the following error: 'msxsl' is not recognized as an internal or external command, operable program or batch file. i have installed msxml6... i m using windows 7 64 bit -AMAR
        use a full path, like "C:\blah\blah\blah....exe"
Re: Using XML::XSLT to convert XML to CSV
by Jenda (Abbot) on Aug 22, 2007 at 22:27 UTC

    Why is your skeletal beginning longer than a complete solution?

    use strict; use XML::Rules; use Text::CSV_XS; use FileHandle; my $csv = Text::CSV_XS->new({eol => "\n"}); my $parser = XML::Rules->new( rules => [ _default => 'content', HOUR_DATA => sub { $csv->print( $_[4]->{parameters}, [ map {$_[1]->{$_}} qw( LOCATION_NAME EFFECTIVE_DATE HOUR CLC DPT HUM DBT WDR WSP HIX WCH ) ]); return; } ] ); open my $FH, '>&STDOUT'; print $FH "# relation, date, time, CloudCoverPct, DewPointF, Humidity, + TempF, WindDirection, WindMPH, HeatIndexF, WindChillF\n"; $parser->parse( \*DATA, $FH); __DATA__ <?xml version="1.0" encoding="UTF-8"?> <WEATHER_DATA> <HOUR_DATA> <LOCATION_NAME>KCNU</LOCATION_NAME> <CLC>20</CLC> <DPT>73</DPT> <HUM>82</HUM> <HIX>84</HIX> <DBT>79</DBT> <WCH>79</WCH> <WDR>210</WDR> <WSP>17</WSP> <EFFECTIVE_DATE>08/20/2007</EFFECTIVE_DATE> <HOUR>14:00</HOUR> <TIME_GENERATED>08/20/2007 09:45:00</TIME_GENERATED> </HOUR_DATA> <HOUR_DATA> <LOCATION_NAME>KCOU</LOCATION_NAME> <CLC>100</CLC> <DPT>71</DPT> <HUM>87</HUM> <HIX>79</HIX> <DBT>75</DBT> <WCH>75</WCH> <WDR>190</WDR> <WSP>13</WSP> <EFFECTIVE_DATE>08/20/2007</EFFECTIVE_DATE> <HOUR>14:00</HOUR> <TIME_GENERATED>08/20/2007 09:45:00</TIME_GENERATED> </HOUR_DATA> </WEATHER_DATA>

      Hi - can anyone explain what the previous bit of code is doing - specifically, how is $_[4]->{parameters} an output filehandle?

      edit to add - ach, my bad. Just noticed that this is coming from XML::Rules rather than CSV_XS... still, any insights always welcome...

      map{$a=1-$_/10;map{$d=$a;$e=$b=$_/20-2;map{($d,$e)=(2*$d*$e+$a,$e**2 -$d**2+$b);$c=$d**2+$e**2>4?$d=8:_}1..50;print$c}0..59;print$/}0..20
      Tom Melly, pm (at) cursingmaggot (stop) co (stop) uk

        XML::Rules lets you specify "rules" to be evaluated whenever a tag (including the content and subtags if any) is parsed and processed. The rule may be either one of the builtins ("keep only the content", "use this attribute as the key and that attribute as the value", ...") or a subroutine to be called.

        If you specify a subroutine, then that subroutine is called with five parameters, the tag name, the hash of attributes (containing also the content and whatever values were produced by the rules for subtags), the array of currently opened tags enclosing the one being processed, the array of attribute hashes for those tags and the parser object. You may either assign the parameters to local variables or access the parameter array @_ directly. So $_[4] is the parser object.

        The parser object is a blessed hash and there are several keys that may be used by the subroutine rules. One of them is {pad} ($_[4]->{pad} inside the subroutine) ... it's reserved for the rules and you can put anything you want there in case you need to keep some state information between the subroutine calls.

        Another is {parameters}. This is again reserved for data specific to your rules, but may be assigned by the ->parse(), ->parsefile(), ->filter() and other such methods. So if you need to pass some data to your rules and do not want to use global variables you pass that data to the ->parse() method as the second parameter and then find it in $_[4]->{parameters}.

        In that script, the second parameter to $parser->parse() was the filehandle so that's what you then find in $_[4]->{parameters}.

        I might have declared the $FH on top of the script and then access it directly from the subroutine, but I do not like to do that. I believe the rules should be as "selfcontained" as possible so that (in this case) if you wanted to use the parser twice and print the results to two different files, you could just pass the filehandle to print to to the ->parse() instead of depending on some global variable.

        Do I make sense? I tried to explain the XML::Rules in its docs and in several nodes here, but it's still a bit hard to understand if you are not used to callbacks or functional programming.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://634160]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (13)
As of 2014-08-22 12:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (157 votes), past polls