http://www.perlmonks.org?node_id=1226699

shreedara75 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying insert a German Text with German Alphabet in an Excel Cell. however what i have stored in a variable and what is printed in XL Cell are different. i assume it is something to do with character encoding. i tried few things but could not figure out a way to insert proper German string in XL. below is my code.
#!/usr/bin/perl use warnings; #use strict; use Encode qw(encode decode); my $enc = 'utf-8'; my $germanText = 'Fräsen und ndk (Kamera - Fräsaufnahme)'; my $germ_str = decode($enc, $germanText); my $g8txt = encode($enc, $germanText); print encode($enc, $germanText); print encode($enc, $germ_str); $xl_form = "C:/tmp/german.xls"; use Win32::OLE qw(in with); use Win32::OLE::Const 'Microsoft Excel'; $Excel = Win32::OLE->GetActiveObject('Excel.Application') || Win32::OLE->new('Excel.Application', 'Quit'); $Excel->{DisplayAlerts} = 1; $Book = $Excel->Workbooks->Open($xl_form); $Sheet = $Book->Worksheets("DRC"); $Sheet->Range("B2")->{Value} = $germanText; $Sheet->Range("B3")->{Value} = $germ_str; $Sheet->Range("B4")->{Value} = $g8txt;
but in XL B2-B4 cells has the following string

B2 = Fräsen und ndk (Kamera - Fräsaufnahme)

B3 = Fräsen und ndk (Kamera - Fräsaufnahme)

B4 = Fräsen und ndk (Kamera - Fräsaufnahme)

how do i insert German text properly in XL?

Replies are listed 'Best First'.
Re: german Alphabet
by ikegami (Patriarch) on Dec 04, 2018 at 06:30 UTC

    Looks like the string should be encoded as per the Active Code Page (though it could be a coincidence).

    #!/usr/bin/perl use warnings; use strict; use feature qw( say ); use utf8; use Win32 qw( ); my ($active_cp, $input_cp, $output_cp); BEGIN { $active_cp = "cp" . Win32::GetACP(); $input_cp = "cp" . Win32::GetConsoleCP(); $output_cp = "cp" . Win32::GetConsoleOutputCP(); } use open ':encoding(UTF-8)'; BEGIN { binmode(STDIN, ":encoding($input_cp)"); binmode(STDOUT, ":encoding($output_cp)"); binmode(STDERR, ":encoding($output_cp)"); } use Encode qw( encode decode ); use Win32::OLE qw( in with ); use Win32::OLE::Const 'Microsoft Excel'; { # Decoded string aka string of Unicode Code Points my $german_text = 'Fräsen und ndk (Kamera - Fräsaufnahme)'; my $xl_form = "C:/tmp/german.xls"; my $Excel = Win32::OLE->GetActiveObject('Excel.Application') or do { Win32::OLE->new('Excel.Application', 'Quit'); die("Error"); }; $Excel->{DisplayAlerts} = 1; my $Book = $Excel->Workbooks->Open($xl_form); my $Sheet = $Book->Worksheets("DRC"); $Sheet->Range("B2")->{Value} = encode($active_cp, $german_text); }

    PS - Why did you comment out use strict;?!? You have a number of scoping errors it would have identified.

Re: german Alphabet
by thanos1983 (Parson) on Dec 04, 2018 at 12:27 UTC

    Hello shreedara75,

    Fellow Monk ikegami provided you with a solution to your problem. Just to add another possible solution that I just tested on my OpenOffice (LinuxOS) in case that someone is not working with WindowsOS:

    #!/usr/bin/perl use utf8; use strict; use warnings; use Excel::Writer::XLSX; my $workbook = Excel::Writer::XLSX->new( 'simple.xlsx' ); my $worksheet = $workbook->add_worksheet(); $worksheet->write( "A1", "Fräsen und ndk (Kamera - Fräsaufnahme)" ); $workbook->close;

    For further information see Excel::Writer::XLSX/UNICODE-IN-EXCEL.

    Hope this helps.

    BR / Thanos

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: german Alphabet
by 1nickt (Canon) on Dec 04, 2018 at 12:52 UTC

    Hi, as ikegami showed, but did not point out, if your source code contains strings encoded in UTF-8, you need to tell Perl about that fact with use utf8;. Oh, sure, you can omit it, and between your editor and your terminal you might be able to type and read characters in "the German alphabet," but that's just a happy accident. See the documentation.

    This is not the complete solution! But in order to have Perl handle your unicode characters properly, you need to start out right.

    Hope this helps!


    The way forward always starts with a minimal test.

      Well, it's not required, but it allows

      use Encode qw( decode ); my $enc = 'utf-8'; my $germanText = 'Fräsen und ndk (Kamera - Fräsaufnahme)'; my $germ_str = decode($enc, $germanText);

      or even

      use Encode qw( decode_utf8 ); my $germ_str = decode_utf8('Fräsen und ndk (Kamera - Fräsaufnahme)');

      to be reduced to

      use utf8; my $germ_str = 'Fräsen und ndk (Kamera - Fräsaufnahme)';

      Imagine manually decoding every string literal in the program like the OP did? Nonsense! Similarly, it's not necessary to add an encoding layer to STDOUT, but encoding everything passed to print would be ludicrous.

      I don't see in ikegami's script the need for use utf8;.

        I don't see in ikegami's script the need for use utf8;.

        The OP as well as ikegami's script contain the string 'Fräsen und ndk (Kamera - Fräsaufnahme)'. From utf8: "The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope. ... Do not use this pragma for anything else than telling Perl that your script is written in UTF-8. ... Because it is not possible to reliably tell UTF-8 from native 8 bit encodings, you need either a Byte Order Mark at the beginning of your source code, or use utf8;, to instruct perl."

        Although the "ä" may happen appear to work because it's part of the Latin1 character set, which Perl typically uses internally, it will most likely not do what you want on any Unicode characters outside of that set. As you can see below, the only version of the code in which the UTF8 is flag properly set on the string is the one where the source is encoded as UTF-8 and use utf8; is used. The rule of thumb I always use is to either work entirely in ASCII (using escapes such as \N{} to specify Unicode characters), or otherwise use a UTF-8 encoding on the source code and use utf8;. See also perluniintro and perlunicode.

        $ cat with_utf8.pl use warnings; use strict; use utf8; use Devel::Peek; my $string = 'Fräsen und ndk (Kamera - Fräsaufnahme)'; Dump($string); $ perl -pe 's/^(?=.*utf8)/#/' with_utf8.pl | tee without_utf8.pl use warnings; use strict; #use utf8; use Devel::Peek; my $string = 'Fräsen und ndk (Kamera - Fräsaufnahme)'; Dump($string); $ iconv -f UTF-8 -t Latin1 without_utf8.pl -o latin1.pl $ file -i *.pl latin1.pl: text/plain; charset=iso-8859-1 without_utf8.pl: text/plain; charset=utf-8 with_utf8.pl: text/plain; charset=utf-8 $ perl latin1.pl SV = PV(0x1365d70) at 0x13855c0 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x13d7160 "Fr\344sen und ndk (Kamera - Fr\344saufnahme)"\0 CUR = 38 LEN = 40 COW_REFCNT = 1 $ perl without_utf8.pl SV = PV(0xa15d70) at 0xa355c0 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0xa87190 "Fr\303\244sen und ndk (Kamera - Fr\303\244saufnahme)" +\0 CUR = 40 LEN = 42 COW_REFCNT = 1 $ perl with_utf8.pl SV = PV(0x18d5d70) at 0x18f55d8 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x19384a0 "Fr\303\244sen und ndk (Kamera - Fr\303\244saufnahme) +"\0 [UTF8 "Fr\x{e4}sen und ndk (Kamera - Fr\x{e4}saufnahme)"] CUR = 40 LEN = 42 COW_REFCNT = 1

        Updated as per ikegami's reply.