Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Strange interaction between command line and encoding

by VSarkiss (Monsignor)
on Jun 02, 2006 at 16:57 UTC ( #553335=perlquestion: print w/replies, xml ) Need Help??
VSarkiss has asked for the wisdom of the Perl Monks concerning the following question:

This might be blatantly obvious, but nobody in the CB seemed to spot the problem, so I thought I'd ask here.

I have some UTF-16 files I need to convert to UTF-8, so I wrote this tiny program:

#! perl use encoding "utf16", STDOUT => "utf8"; while (<>) { print }
It works fine. Then I realized I could just do it all from the command line:
perl -Mencoding=utf16,STDOUT,utf8 -p -e 1 < in > out
Much to my surprise the output isn't UTF-8, it's some strange UTF-16-ish thing (the file is hosed, basically).

Just for grins, I even tried

perl -Mencoding=utf16,STDOUT,utf8 -n -e print < in > out
But it had the same results.

Anybody see what's going on here? Isn't that command line identical to the little program?

For reference, I'm using ActiveState on Windows XP. Perl -V output is below.

$ perl -V Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=MSWin32, osvers=5.0, archname=MSWin32-x86-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef usethreads=define use5005threads=undef useithreads=define usemulti +plicity=de fine useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D +_CONSOLE - DNO_STRICT -DHAVE_DES_FCRYPT -DNO_HASH_SEED -DUSE_SITECUSTOMIZE -DPERL +_IMPLICIT_ CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX', optimize='-MD -Zi -DNDEBUG -O1', cppflags='-DWIN32' ccversion='12.00.8804', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64 +', lseeksi ze=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -l +ibpath:"C: \Perl\lib\CORE" -machine:x86' libpth=\lib libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib + comdlg32 .lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uui +d.lib ws2_ 32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.l +ib perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool +.lib comd lg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib + uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvc +rt.lib libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt: +ref,icf - libpath:"C:\Perl\lib\CORE" -machine:x86' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS PERL_MALLOC_WRAP PL_OP_SLAB_ALLOC USE_ITHREADS USE_LARGE_FILES USE_PERLIO USE_SITECUSTOMIZE Locally applied patches: ActivePerl Build 817 [257965] Iin_load_module moved for compatibility with build 806 PerlEx support in CGI::Carp Less verbose ExtUtils::Install and Pod::Find Patch for CAN-2005-0448 from Debian with modifications Partly reverted 24733 to preserve binary compatibilty 27528 win32_pclose() error exit doesn't unlock mutex 27527 win32_async_check() can loop indefinitely 27515 ignore directories when searching @INC 27359 Fix -d:Foo=bar syntax 27210 Fix quote typo in c2ph 27203 Allow compiling swigged C++ code 27200 Make stat() on Windows handle trailing slashes correctly 27194 Get perl_fini() running on HP-UX again 27133 Initialise lastparen in the regexp structure 27034 Avoid "Prototype mismatch" warnings with autouse 26970 Make Passive mode the default for Net::FTP 26921 Avoid getprotobyname/number calls in IO::Socket::INET 26897,26903 Make common IPPROTO_* constants always available 26670 Make '-s' on the shebang line parse -foo=bar switches 26379 Fix alarm() for Windows 2003 26087 Storable 0.1 compatibility 25861 IO::File performace issue 25084 long groups entry could cause memory exhaustion 24699 ICMP_UNREACHABLE handling in Net::Ping Built under MSWin32 Compiled at Mar 20 2006 17:54:25 @INC: c:/Perl/lib c:/Perl/site/lib .

Somehow it's related to using cygwin bash shell, because both the command line and the tiny program work OK using cmd.exe. It's getting too complicated for my tiny brain, so I'm just going to stop worrying about it and use what works.

Replies are listed 'Best First'.
Re: Strange interaction between command line and encoding
by sgifford (Prior) on Jun 02, 2006 at 18:01 UTC
    Strange. It works for me on Linux with Perl 5.8.3. Some quick tests show that the arguments to a class's import function are identical for both the command-line version and the use statement. Maybe it's time to fire up the debugger or add some print statements to your

    Update: Here are the small test programs I used. They can probably tell you what's different between cygwin and cmd.exe.

    package T56; use base 'Exporter'; sub import { my $class = shift; print "$class import @{[scalar(@_)]} items: @_\n"; my $fh = $_[0]; print $fh "Ouput to first param\n"; } 1;


    use T56 STDOUT, bar => 'baz';

    Compare the output of these to see what's different:

    perl t56 perl -MT56=STDOUT,bar,baz -e 1

    End of Update

    Here's my Perl info:

Re: Strange interaction between command line and encoding
by graff (Chancellor) on Jun 02, 2006 at 19:55 UTC
    perl -Mencoding=utf16,STDOUT,utf8 -p -e 1
    works for me, and so does:
    perl -Mencoding=utf8,STDIN,utf16 -p -e 1
    using both 5.8.7 and 5.8.8 on freebsd (-V output below, FWIW). My first inclination for this sort of thing would be to use binmode:
    perl -pe 'BEGIN{binmode STDIN,":encoding(utf16)";binmode STDOUT,":utf8 +"} 1;' < in > out
    But that involves a bit more typing.
Re: Strange interaction between command line and encoding
by bpphillips (Friar) on Jun 02, 2006 at 20:28 UTC
    Update: OK, I think I am way off base... The original usage has a "fat-comma" on the right side of STDOUT which makes it a string anyway... I can't even claim no coffee since I don't drink it! Ah well...

    OK, I may be way off base, but using B::Deparse to see how perl treats the -M command line switch shows it's processed as:
    use encoding (split(/,/, 'utf16,STDOUT,utf8', 0));
    That seems to indicate that the problem is that STDOUT is just a string rather than the actual filehandle when it's passed into encoding::import via the -M command line parameter.


    -- Brian
      But STDOUT => "utf8" return strings too! It's equivalent to 'STDOUT', "utf8".

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://553335]
Approved by marto
Front-paged by sgifford
LanX hates UTF8 for causing knots in his brain and stomach
[Corion]: LanX: Yes, that's the main problem - you have lots (and lots) of workarounds in various places and stages of the processing, and to clean that mess up requires action across the complete codebase. And it's almost impossible to do it piece-by-piece

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (11)
As of 2017-01-16 14:06 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (150 votes). Check out past polls.