Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

UTF8 support for CSV, PDF formats

by vishi83 (Pilgrim)
on Jan 06, 2012 at 11:54 UTC ( #946586=perlquestion: print w/replies, xml ) Need Help??
vishi83 has asked for the wisdom of the Perl Monks concerning the following question:

Hello great minds,

I'm facing issues in writing unicode characters from a data file to CSV or PDF formats in perl.

The concept i'm trying with perl is,
I've a file which has non-english characters.. For example, i'm using japanese characters. I"m trying to write these japanese characters to various formats. I was able to view this file as html in browser or as excel. All i had to do is to support by application with charset: utf-8 encoding.

But to create a CSV or a PDF file, i'm writing this japanese content using file open(). Here, i tried the below way,

open (FILE, ">:encoding(UTF-8)", 'output.csv');
(or) open (FILE, ">:encoding(UTF-8)", 'output.pdf');

i also tried, binmode(FH, ":utf8");

i also tried using,
Text::CSV->new( { encoding => "utf8" } )

In any of these methods, i'm able to see the output in notepad or browser.. but when i open the CSV file in the excel, i don't see the japanese characters properly

From your expertise, can you please suggest me how to go about this and provide some ideas, pls?


Replies are listed 'Best First'.
Re: UTF8 support for CSV, PDF formats
by afoken (Abbot) on Jan 06, 2012 at 12:15 UTC
    but when i open the CSV file in the excel, i don't see the japanese characters properly

    I think you need to tell Excel about the UTF-8 encoding, perhaps in the File-Open-dialog. Why don't you use Spreadsheet::WriteExcel or Excel::Writer::XLSX to generate "real" Excel files?


    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Thanks. I dint have issues with excel using the Spreadsheet::WriteExcel module.
      However, i also want to make sure i can handle a delimited formats like CSV.
Re: UTF8 support for CSV, PDF formats
by Eliya (Vicar) on Jan 06, 2012 at 14:11 UTC

    As for the CSV files, perhaps adding a BOM would help?  AFAIK, it is widely used on the Windows platform (even with UTF-8) to both indicate that the files do have Unicode content, and to specify the particular encoding being used (UTF-8, UTF-16le, etc.).

    For this, the first thing you write to the file should be the BOM (\x{feff}):

    my $fname = 'output.csv'; open my $fh, ">:utf8", $fname or die "couldn't open '$fname': $!"; print $fh "\x{feff}"; ...
      Thanks for your response. Using BOM, did work for CSV format. I'm able to see the data properly now!
      However, it dint work for PDFs. I"m still trying to figure out a way for that.

      Any thoughts?
      A perl Script without 'strict' is like a House without Roof; Both are not Safe;

        yes, buy a pdf library  open (FILE, ">:encoding(UTF-8)", 'output.pdf'); is a mistake 99% of the time

Re: UTF8 support for CSV, PDF formats
by Taulmarill (Deacon) on Jan 06, 2012 at 12:19 UTC
    Excel has no way of guessing, which Encoding your CSV-File has. Mine (german Excel 2003) seems to use latin-1.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://946586]
Approved by bart
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2017-07-24 17:41 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (356 votes). Check out past polls.