Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

How to convert non-ASCII output from qx

by freonpsandoz (Beadle)
on Jun 18, 2012 at 00:09 UTC ( #976696=perlquestion: print w/replies, xml ) Need Help??

freonpsandoz has asked for the wisdom of the Perl Monks concerning the following question:

How should output from a command, executed using qx or backticks, that may contain non-ASCII characters, be converted to a correct internal representation for processing and output to a UTF-8 file? A user on nisus.com recommended using utf8::decode. This only seems to work for me (Windows platform) if the active console code page is 1252. On my system, the active CP seems to change between 437 and 1252 at times, so this doesn't work for me. I can set the CP to 437 (which is what I believe it should be) using Win32::Console::OutputCP and then convert it from cp437 using Encode::decode, but I'm not sure if this is the right way. Related question: why doesn't Perl handle this conversion itself by determining what command output encoding is in effect and converting as needed? Are there some platforms on which the character encoding of command output cannot be determined? Thanks.
  • Comment on How to convert non-ASCII output from qx

Replies are listed 'Best First'.
Re: How to convert non-ASCII output from qx
by ikegami (Patriarch) on Jun 18, 2012 at 04:49 UTC
    On my system, the active CP seems to change between 437 and 1252 at times

    1252 is your ANSI CP. 437 is your OEM CP. Don't ask me what's used for what, but it's easy to find by testing. (I think you'll find the ACP used for systems calls, whereas the OEMCP will be used for console IO.)

    Encode's encode and decode functions can handle Windows code pages. Just prepend "cp" to the number (e.g. cp1252, cp437).

Re: How to convert non-ASCII output from qx
by tobyink (Canon) on Jun 18, 2012 at 07:07 UTC

    "why doesn't Perl handle this conversion itself by determining what command output encoding is in effect and converting as needed?"

    How could it even reliably determine whether the output was text? A command launched via qx could easily be outputting binary data (e.g. an image; some compressed data; etc). "Converting encoding" on binary data is very likely to corrupt it.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      How about something like qx/command/t to indicate text output?
Re: How to convert non-ASCII output from qx
by bulk88 (Priest) on Jun 18, 2012 at 06:22 UTC
    Try posting code. The console is traditionally legacy 1 char per byte code page. Type "chcp" at your console to see what it is. Programs normally print legacy CP data to console, not utf8. It would look like gibberish if it was utf8 printed to console. Also you can run into truncation/substitution problems, where your non-latin letters being real 1 char 1 byte "?"s. Technically a program can print binary to the console, often done by unix-ish tools. You could also try and mark the STDIN/STDOUT as utf8, i'm not sure how successful that is on Perl Windows (worst case, console spits out legacy cp, perl coverts all the invalid utf8 character sequences to a filler characters).
      By "at your console" I assume you mean "in a cmd.exe window." Right after a reboot, chcp reports the active console code page as 437. Later on, some undetermined process changes it. After that, chcp reports the code page as 1252. My script does set STDOUT to UTF8 with binmode and the output, redirected to a file, is correct. I tried setting STDIN to UTF8, but this seems to have no effect on the behavior of qx.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://976696]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (1)
As of 2022-09-30 19:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (126 votes). Check out past polls.

    Notices?