Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked


by jms53 (Monk)
on Jan 18, 2012 at 22:45 UTC ( #948642=perlquestion: print w/replies, xml ) Need Help??
jms53 has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I have (almost) finished a project doing statistics on input texts. Since I want to see the results other than on a shell, I am exporting most of it to a CSV. Currently, my export code is prefixing numbers with:


I am not sure whether this is due to my formatting or juste OOo acting weird... (or if I should export to html instead...) My output code is the following:

open OUTPUT, ">$output"; # print stats to csv print OUTPUT "Word; count; Unique words;", $stats->count(),"; Tota +l words;", $stats->sum(),"; mean;", $stats->mean(),"; variance ;" +, $stats->variance(),"; sigma;", $stats->standard_deviation()," \n"; foreach my $word (sort keys %words) { # print "$word \t $words{$word} \n"; print OUTPUT "$word; $words{$word} \n"; } close OUTPUT;

full code here and one of my input texts here

Thank you!

Replies are listed 'Best First'.
Re: CSV or HTML?
by furry_marmot (Pilgrim) on Jan 19, 2012 at 00:48 UTC
    For CSV, use Text::CSV_XS. It will take care of all the CSV creation. I assume you have Excel or OpenOffice, so you can load it in and print it as you like. If you have something else in mind, you might consider saying it instead of leaving us to guess.

      Thats interesting. All the CSV modules I found were meant to read a CSV in PERL as opposed to writing one.

      I am using OpenOffice, and the issue was that it was adding the single quote to numbers, therefore making it impossible to format the column as "number".

      I'll be sure to look up that module, thanks!

        I've seen that complaint before, which is odd because the docs clearly tell you how to write a CSV, as well. The first sentence of the description says so specifically. The functions you want are combine (into a CSV string) or, more likely, print (combine and then print to a filehandle).

        Here's a simple example:

        @headers = ( "First", "Second", "Third", "Fourth" ); @cols = ( "I'm text", 12, # <--Number "", "That was a blank" ); my $csv = Text::CSV_XS->new; open my $fh, ">Output.csv" or die $!; $csv->print($fh, @headers); # Prints with \n by default $csv->print($fh, @cols); # Use something like # $csv->print($fh, @$_) for @rows # for an array of data.

        Output.csv should look like this:

        "First","Second","Third","Fourth" "I'm text",12,,"That was a blank"


        --marmot UPDATED: for clarity and to correct a typo.
Re: CSV or HTML?
by Tux (Abbot) on Jan 19, 2012 at 07:21 UTC

    When - as suggested - you produce correct (and valid) CSV *), there are utilities available that can convert CSV to XLS or HTML.

    e.g. Spreadsheet::Read comes with xlscat that offers the -H option.

    $ cat test.csv header,line,1 a,34,12 b,42, c, ,mars $ xlscat -H test.csv >test.html 3 x 4 $

    The HTML includes a valid header with CSS, so you can alter the output appearance (with alternating even/odd line coloring)

    <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" " +R/xhtml11/DTD/xhtml11.dtd"> <html xmlns="" xml:lang="en"> <head> <title>test.csv</title> <meta name="Author" content="xlscat 2.1" /> <style type="text/css"><!-- body, h2, td, th { font-family: "Nimbus Sans L", "DejaVu Sans", Helvetica, Arial, sans; } table { border-spacing: 2px; border-collapse: collapse; } td, th { vertical-align: top; padding: 4px; } table > tbody > tr > th, table > tr > th { background: #e0e0e0; } table > tbody > tr > td:not([class]), table > tr > td:not([class]) { background: #f0f0f0; } .odd { background: #e0e0e0; } --></style> </head>



    *) Producing valid CSV with Text::CSV_XS or Text::CSV is easy, use print as demonstrated in this example.

    Enjoy, Have FUN! H.Merijn

      Shame it wraps the CSS in that cargo cult <!-- --> nonsense. This was kinda useful for a few browsers in the mid-1990s, but is actively harmful today.

      In XML DTDs (including the DTD for XHTML 1.1), there is no mechanism to specify that the content model for an element is CDATA. Thus the <!-- --> acts as a genuine XML comment, and causes the style sheet contained within it to be commented out and totally ignored.

      If a browser is in HTML mode (i.e. you serve the page using Content-Type: text/html) you won't notice this, but as soon as you switch to Content-Type: application/xhtml+xml all your styles will disappear.


      <html xmlns=""> <head> <title>CSS demo</title> <style type="text/css"> p { color: green } </style> <style type="text/css"><!-- p.mine { color: red } --></style> </head> <body> <p class="mine">This should be green.</p> </body> </html>

      The above, viewed in a standards compliant browser, in XHTML mode, will show a green paragraph. The CSS which sets it to red is commented out and ignored.

        ...and the name of "a standards compliant browser" (to which to your refer) is?

        I ask because I don't really believe there is such a thing.

        Tangent: Using the phrase "standards compliant" (yeah, I'm guilty too) tends to obscure the issue -- IMO -- which arises from the fact that "standards compliant" is not the same as "implements all of whichever relevant standard one might select."

        There is, for example, a "standard" (and right now I've forgotten whether its CSS or html4.1) providing a way to align a mixed length set of decimal numbers*1 in a column of <td>s in a <table> (it applies to other things too, but that's easy to grasp.

        That would be a very handy standard to follow... except that no browser (of which I'm aware) actually implements it. Yes, there are workarounds, but as is often the case, those workarounds are often a PITA.

        *1 Example of the mixed length set of:


        One common workaround, aligning the column rightward, doesn't work there, but some means of aligning the decimal points sure would make it easier to read.

Re: CSV or HTML?
by InfiniteSilence (Curate) on Jan 19, 2012 at 04:29 UTC

    I ran the code with your output file and I didn't see any single quote before the numbers.

    cat input_output.txt Word; count; Unique words;204; Total words;390; mean;1.91176470588235; + variance ;17.0069545059403; sigma;4.12394889710582 ; 45 a; 5 about; 1 acts; 1 all; 1 amid; 1 and; 13 as; 2 ...

    Maybe I'm just confused about your problem? BTW, that output isn't any kind of valid CSV I've ever seen. The first line doesn't match any of the other lines.

    Celebrate Intellectual Diversity

      The first line is slightly different, it gives the column names and then all other data relevant. (not sure if that invalidates my output).

      I tweaked the code to print the variables as '$words{$word}', and then replaced the single quotes to nothing in OpenOffice which then gave correct formatting.

      I was just bizarre because previously, sorting Z-A gave me 918 coming before 16000.

      Thank you very much!

Re: CSV or HTML?
by Ralesk (Pilgrim) on Jan 20, 2012 at 21:52 UTC

    Might be a weird thing to suggest, but if you do indeed want to read your data in OOo, why not use XLS or XLSX? Both of those have excellent readers and writers on CPAN, in particular, Spreadsheet::XLSX and Spreadsheet::WriteExcel. You have a way easier time differentiating between numbers and text, and there's no awkward conversions to be done in either Office or OOo/LO.

    Update: Excel::Writer::XLSX is another one from the same author as WriteExcel. Spreadsheet::XLSX is the reader for XLSX files. Sorry about the mixup.

      This might be the best solution.

      Looking it up now, because the Spreadsheet certainly consumed more memory doing a simple "sort" where Perl did it with a larger datasample, and more instructions in about 1 second.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://948642]
Approved by ww
Front-paged by ww
[choroba]: Corion A colleague has helped me. It seems the "Modification of read-only value" was caused by old Test::More version which doesn't have done_testing, plus the old bug with $@ sometimes showing a different exception
[choroba]: will upload a fixed version and see

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (11)
As of 2018-06-25 14:10 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.