Re^5: utf8 in directory and filenames

I have read the documents in question, and understand what they say. Really. I have been handling these character set issues for a while, including the Unicode/ISO conversions back and forth in Perl (and iso-8859-x, and "modified utf-7 for IMAP", etc..). I just thought I'd insist on this so that you wouldn't think that I don't understand the basic issues at hand.

Do you understand the difference between a Perl unicode string, and a UTF-8 encoded string? That's a bit more complicated than converting between encodings back and forth, and it's the key issue at hand.

What I did learn from you, was that I should apparently not blindly convert my filenames to utf8

Or anything else. A filename, once converted or encoded, is no longer the same filename.

failed the "-f" test and an open() test, and I was, and still am, trying to figure out why.

You really, really need to have the error message. If you don't want to output it to STDERR or STDOUT, you can open a log file and write it there. Without the error message, you can only guess what's wrong. Guessing absolutely sucks, because it takes too much time.

I now am close to believing that there are gremlins at play.

If you're on Linux, use strace(1) to find where the gremlins are.

Do I just pick up your last message and hit reply, or start a new question ?

You can continue with the old thread, but it's harder to notice the new message then. I hate to say this, but you're better off starting a new thread. Don't forget to refer to the old one.

Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

Comment on Re^5: utf8 in directory and filenames

Replies are listed 'Best First'.

Re^6: utf8 in directory and filenames
by soliplaya (Beadle) on Nov 14, 2006 at 01:53 UTC

#!/usr/bin/perl
use strict;
use warnings;
use Encode;

# At the Beginning, there was an iso-8859-1 name string..
my $testname = "Presentación.txt";
print "starting string [$testname] " . (Encode::is_utf8($testname) ? "
+(utf8)" : "(bytes)") . "\n";

my $fname_iso = $testname; # simply copying leaves it iso bytes
print "  creating file1 [$fname_iso] " . (Encode::is_utf8($fname_iso) 
+? "(utf8)" : "(bytes)") . "\n";
open(F1,'>:raw',$fname_iso) or die "cannot open F1 : $!";
print F1 "Hello 1\n";
close F1;

my $fname_utf8 = decode('iso-8859-1',$testname); # force internal utf8
print " creating file2 [$fname_utf8] " . (Encode::is_utf8($fname_utf8)
+ ? "(utf8)" : "(bytes)") . "\n";
open(F2,'>:raw',$fname_utf8) or die "cannot open F2 : $!";
print F2 "Hello 2\n";
close F2;

my $dir = "."; # that's iso bytes too by default

opendir(DIR,$dir);
my @entries = readdir DIR;
close DIR;
foreach (@entries) {
    next if $_ =~ /^\./;
    next if $_ =~ /\.pl$/; # skip myself too
    print "entry [$_] " . (Encode::is_utf8($_) ? "(utf8)" : "(bytes)")
+ . "\n";

    print "  first try :\n";
    if (-f "$dir/$_") { # like this, leaves it as bytes
        print "    passes the -f test,";
        unless (open(F1,'<',"$dir/$_") ) {
            print "  but cannot be opened : $!\n";
        } else {
            print "  and can be opened !\n";
            close F1;
        }
    } else {
        print "    fails the -f test\n";
    }

    print "  2d try :\n";
    my $fullpath = "${dir}/${_}"; # leaves it as bytes also
    print "  trying [$fullpath] " . (Encode::is_utf8($fullpath) ? "(ut
+f8)" : "(bytes)") . "\n";
    if (-f $fullpath) {
        print "    passes the -f test,";
        unless (open(F1,'<',$fullpath) ) {
            print "  but cannot be opened : $!\n";
        } else {
            print "  and can be opened !\n";
            close F1;
        }
    } else {
        print "    fails the -f test\n";
    }

    print "  3d try :\n";
    my $dir_utf = decode('iso-8859-1',$dir); # force internal utf8
    my $fullpath2 = "${dir_utf}/${_}"; # concatenate forces utf8 flag 
+on the whole
    print "  trying [$fullpath2] " . (Encode::is_utf8($fullpath2) ? "(
+utf8)" : "(bytes)") . "\n";
    if (-f $fullpath2) {
        print "    passes the -f test,";
        unless (open(F1,'<',$fullpath2) ) {
            print "  but cannot be opened : $!\n";
        } else {
            print "  and can be opened !\n";
            close F1;
        }
    } else {
        print "    fails the -f test,";
        unless (open(F1,'<',$fullpath2) ) {
            print "  and fails the open() : $!\n";
        }

    }

}

exit 0;
[download]

[reply]
[d/l]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks