Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: Tk and Non-ASCII File Names

by eff_i_g (Curate)
on Sep 28, 2010 at 15:43 UTC ( [id://862453]=note: print w/replies, xml ) Need Help??


in reply to Re: Tk and Non-ASCII File Names
in thread Tk and Non-ASCII File Names

zentara,

use utf8; works with the code given; however, when I incorporate it into the larger program it does not work. I'm mimicking that in the posted script by replacing

my $file = '06_Protection_de_la_tête.xml';
with
use File::Find::Rule; my @files = File::Find::Rule->file->name('*.xml')->in('.'); my $file = shift @files;

If I follow this with

use Encode; my $file = decode('utf8', $file)
then all of the non-Tk lines break:
06_Protection_de_la_t�te.xml: No such file or directory
Argh!

Do you suspect this to be Tk-related since the other commands work fine and there is an open bug related to this matter? Tk::ExecuteCommand builds the command by appending to the command I pass:

$self->{-command} . ' 2>&1 |'

Could this concatenation be changing the internal encoding of the string no matter what I send to the module? As I noted in my other reply, I can get all of this working if I modify a copy of the module and use utf8::downgrade, but I don't know if it's wise to change the one in production.

Replies are listed 'Best First'.
Re^3: Tk and Non-ASCII File Names
by zentara (Archbishop) on Sep 28, 2010 at 16:47 UTC
    Do you suspect this to be Tk-related since the other commands work fine and there is an open bug related to this matter?

    That sounds plausible, but hopefully an unicode expert like graff will weigh in. ( Maybe private msg graff and ask him to look at it? ) I'm a provincial american, who seldoms deals with non-ascii filenames. :-)

    I would first try reading the directories and printing the list to a Tk text box, and see if there is any name changes.


    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re^3: Tk and Non-ASCII File Names
by graff (Chancellor) on Sep 30, 2010 at 02:34 UTC
    I don't have any way to test in an environment that matches yours, but based on what you've posted so far, it would appear that your locale settings and non-ascii file names are "consistent" -- both involve a single-byte-per-character encoding for "vanilla" European (8859-1, i.e. Latin-1).

    So your Encode::decode call should specify that the string being passed to it needs to be decoded from that encoding:

    my $file = decode( 'iso-8859-1', $file );
    Try that and see if it helps. The return value should be a valid utf8 string with the accented "e" rendered as intended, because the value being passed in $file is a valid 8859-1 string.

    When you passed 'utf-8' as the first arg to decode(), perl was being told to expect utf8 data in $file, but the single non-ascii byte there was not parsable as utf8, and what you got in place of it was the unicode "REPLACEMENT CHARACTER" (U+FFFD), which, when rendered as utf8 data, is the three-byte sequence "0xef 0xbf 0xbd", and that sequence, when played through a Latin-1 display window, yields the three goofy characters that you got.

      graff,

      Thanks for your input. I tried decode but I'm still error-ridden; I've included the updates below. Any other ideas or encoding tricks where I can see what's going on under the hood?

      Command Line Output:
      06_Protection_de_la_tête.xml: No such file or directory No such file or directory Assuming 'require Tk::ExecuteCommand;' at ./tmp.pl line 24
      Tk Output:
      06_Protection_de_la_tête.xml: No such file or directory
      Code:
      #!/usr/local/bin/perl use warnings; use strict; use Tk; use File::Find::Rule; use Encode qw(decode); #my $file = '06_Protection_de_la_tête.xml'; my @files = File::Find::Rule->file->name('*.xml')->in('.'); my $file = shift @files; $file = decode('iso-8859-1', $file); my $cmd = "ls -l $file"; ### Try ls. print qx($cmd); ### Try reading the first line. open my $F, '<', $file; print $! ? "$!\n" : scalar <$F> ; ### Try ls via Tk. my $mw = MainWindow->new; my $exec = $mw->ExecuteCommand( -command => $cmd, )->pack; $exec->execute_command; $exec->update; MainLoop;
        In case you didn't know, the strings you just posted as the "Command Line Output" and "Tk Output" both contain a valid utf8 character ("e with circumflex"); it looks bad because both your shell terminal window and your Tk window are using iso-8859-1 encoding to display the data; the two "wrong" characters that you see are actually the two bytes of the utf8-encoded character, being interpreted as separate 8859-1 characters instead.

        Having the string in utf8 encoding probably explains why the open() fails as well -- you've actually "changed" the file name (the value of $file), by changing the encoding.

        So, I'm a little confused about the nature of the original problem... What happens if you take this most recent instance of your code (using File::Find::Rule), and comment out this line?

        $file = decode('iso-8859-1', $file );

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://862453]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-04-24 05:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found