Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^3: Tk and Non-ASCII File Names

by graff (Chancellor)
on Sep 30, 2010 at 02:34 UTC ( [id://862735]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Tk and Non-ASCII File Names
in thread Tk and Non-ASCII File Names

I don't have any way to test in an environment that matches yours, but based on what you've posted so far, it would appear that your locale settings and non-ascii file names are "consistent" -- both involve a single-byte-per-character encoding for "vanilla" European (8859-1, i.e. Latin-1).

So your Encode::decode call should specify that the string being passed to it needs to be decoded from that encoding:

my $file = decode( 'iso-8859-1', $file );
Try that and see if it helps. The return value should be a valid utf8 string with the accented "e" rendered as intended, because the value being passed in $file is a valid 8859-1 string.

When you passed 'utf-8' as the first arg to decode(), perl was being told to expect utf8 data in $file, but the single non-ascii byte there was not parsable as utf8, and what you got in place of it was the unicode "REPLACEMENT CHARACTER" (U+FFFD), which, when rendered as utf8 data, is the three-byte sequence "0xef 0xbf 0xbd", and that sequence, when played through a Latin-1 display window, yields the three goofy characters that you got.

Replies are listed 'Best First'.
Re^4: Tk and Non-ASCII File Names
by eff_i_g (Curate) on Oct 06, 2010 at 17:22 UTC
    graff,

    Thanks for your input. I tried decode but I'm still error-ridden; I've included the updates below. Any other ideas or encoding tricks where I can see what's going on under the hood?

    Command Line Output:
    06_Protection_de_la_tête.xml: No such file or directory No such file or directory Assuming 'require Tk::ExecuteCommand;' at ./tmp.pl line 24
    Tk Output:
    06_Protection_de_la_tête.xml: No such file or directory
    Code:
    #!/usr/local/bin/perl use warnings; use strict; use Tk; use File::Find::Rule; use Encode qw(decode); #my $file = '06_Protection_de_la_tête.xml'; my @files = File::Find::Rule->file->name('*.xml')->in('.'); my $file = shift @files; $file = decode('iso-8859-1', $file); my $cmd = "ls -l $file"; ### Try ls. print qx($cmd); ### Try reading the first line. open my $F, '<', $file; print $! ? "$!\n" : scalar <$F> ; ### Try ls via Tk. my $mw = MainWindow->new; my $exec = $mw->ExecuteCommand( -command => $cmd, )->pack; $exec->execute_command; $exec->update; MainLoop;
      In case you didn't know, the strings you just posted as the "Command Line Output" and "Tk Output" both contain a valid utf8 character ("e with circumflex"); it looks bad because both your shell terminal window and your Tk window are using iso-8859-1 encoding to display the data; the two "wrong" characters that you see are actually the two bytes of the utf8-encoded character, being interpreted as separate 8859-1 characters instead.

      Having the string in utf8 encoding probably explains why the open() fails as well -- you've actually "changed" the file name (the value of $file), by changing the encoding.

      So, I'm a little confused about the nature of the original problem... What happens if you take this most recent instance of your code (using File::Find::Rule), and comment out this line?

      $file = decode('iso-8859-1', $file );
        graff,

        Aye, the UTF-8 is valid, but my locale is en_US.ISO8859-1 (or at least some of it is; see my original post).

        If I comment out the encoding line the ls works as well as reading a line from the file, but Tk still reports:

        06_Protection_de_la_tête.xml: No such file or directory

        Back to the original problem: without any fancy stuff (decoding, encoding, etc.) commands with non-ASCII characters work fine except when they are ran through Tk::ExecuteCommand, which is what I'm trying to fix because it's a Tk application;the others are just examples. I posted a solution for this in an earlier reply, but it involves changing the module's code and I'm not sure how wise that is—if I'm fixing a valid bug or creating a potential problem for the future.

        Thanks for sticking with me on this one!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://862735]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (7)
As of 2024-04-23 12:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found