Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

En/Decode a unicode path

by exilepanda (Friar)
on Jan 22, 2023 at 08:18 UTC ( [id://11149755] : perlquestion . print w/replies, xml ) Need Help??

exilepanda has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Is that any simple way just en/decode a unicode path that can be understand by any sub take path as parameter like dealing with ANSI? (I am working on windows.)

If I am directly dealing with the target directory/file read write, Win32::Unicode will do most of the job, but for modules/features that taking path as parameter, eg Storable, or sort of flock() functions, my Unicode path are mostly won't recognized.

And, my real deal is even more complex. Sometimes I read path list from text files, sometime I readdir() with recursive loop, or even `dir /s/b`. So I am looking a solution for a path translation that satisfy any kind of sub argument spec, instead of particular situation.

Any magic here? Thanks in advance.


Update, some code for SSCCE. Again, I am asking if anyway can make a path can be recognized in all other (CPAN) modules, if they don't use Unicode friendly method to take care of path.

use strict; use utf8; use Encode qw/encode/; use Storable; use Win32::Unicode::Util; use Win32::Unicode::Native; opendir my $D, encode ( "cp1252" => "./Test/我/XX" ) or die $!; +#OK print $/; my $path = "./Test/我/"; opendir my $D, $path or die $!; # OK my @files = readdir $D ; close $D; print "$_$/" foreach @files; print $/; $path .= "This.sto"; store { A => { Key => 'A' } }, $path or die $!; __END__ Outputs . .. XX 們 地 can't create ./Test/我/This.sto: Invalid argument at C:\....\ope line 18.


Update 2: I came up with this package for myself.

package Win32::MakePathANSI; require Exporter;our @ISA = qw/Exporter/;our @EXPORT = qw/ansi_path/ ; use strict; use utf8; use Win32; use Win32::Unicode::Native; sub _make_dir { my $root = shift; $root = Win32::GetFullPathName($root); foreach my $part ( @_ ) { $root .= "/$part"; mkdir $root; die "Unable to assert dir for '$root' $!" unless Win32::GetSho +rtPathName ($root); } return $root; } sub ansi_path { my $path = shift; my $assert = shift; # expects 'Dir' or 'File', where 'File' implie +d 'Dir' my $isDir = $path =~ /[\\\/]$/ ? 1 : 0; $path =~ s/[\\\/]{2,}/\//g; my $acpp = Win32::GetShortPathName($path); return $acpp if $acpp; return $path unless $assert ; my @parts = split /\//, $path; if ( $assert eq "File" ) { my $file = pop @parts; my $root = _make_dir ( @parts ); open my $f, ">", "$root/$file" or die $!; close $f; } elsif ( $assert eq "Dir" ) { _make_dir ( @parts ); } else { die "Expects 'Dir' or 'File'" } return Win32::GetShortPathName($path)||$path; #in case anything } 1; __END__ =Description To make a Unicode path becomes ANSI on Win32, thus modules not support +ing Unicode can still access the path ( I guess ); =cut =Synopsis use strict; use Win32::MakePathANSI; my $ansi_path = ansi_path ( $unicode_path, $assert ) ; # $assert will create the path when the path not exists # $assert expects 'Dir' or 'File', so this pkg knows what to assert =cut =Note 1. Will die if unable to assert 2. Without $assert, return origin path if not found 3. Origin path will be returned if unable to convert =cut

Replies are listed 'Best First'.
Re: En/Decode a unicode path
by LanX (Saint) on Jan 22, 2023 at 11:24 UTC
    Please provide a SSCCE because there are too many ways to (mis)understand your question.

    In general any Perl function should be able to deal decoded strings (i.e. converted to the internal unicode character format, hence the string has the so called UTF8 flag activated).

    Some (like JSON::decode_json) don't and expect encoded octet strings, so you need to check the documentation.


    see Encode for details

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

      for example
      use Storable; store { }, "Some/Unicode/PathFileName.sto"; # fail retrieve "Some/Unicode/PathFileName.sto"; # also fail
      Because I can't make Storable use Win32::Unicode.

      Encode, Win32::Unicode::* only works when I am directly access the dir/file, but won't work when other modules not using it.

Re: En/Decode a unicode path
by kcott (Archbishop) on Jan 22, 2023 at 20:54 UTC

    G'day exilepanda,

    Earlier this month I wrote "Re: Unicode file names". This involves MSWin, Unicode filenames and uses readdir(). It's probably not a solution to your problem but may provide some direction. Please present something like this when you write your SSCCE.

    See also File::Spec (a core module) for platform-independent filename/pathname handling.

    — Ken

      Hello! I have updated some code to my op. Any further idea?
Re: En/Decode a unicode path (Win32::Unicode::Native?)
by Anonymous Monk on Jan 22, 2023 at 09:50 UTC
      Win32::Unicode::Native is really helpful when I am directly accessing the dir/file ( or say, in my $main:: scope) because this module overloads bunch of CORE::* functions so that I can open an Unicode path like opening an ANSI path. However for other modules I loaded, like my example given Storable, it won't benefit from Win32::Unicode::Native, and thus my Unicode file path won't work in that module scope.
Re: En/Decode a unicode path
by jwkrahn (Abbot) on Jan 22, 2023 at 11:12 UTC
      American National Standards Institute?

      From Windows-1252 (emphasis mine):

      This article is about the character encoding commonly mislabeled as "ANSI". For the actual ANSI character encoding, see ASCII. For the actual "ANSI extended Latin" encoding, see ANSEL.

      For every Windows API function that deals with a string, there's an (A)NSI and a (W)ide version of it.

      The ANSI version uses the ANSI/Active Code Page as the encoding, while the wide version uses UTF-16le.

      Perl builtins use the (A)NSI version of API functions, so interactions with the system via Perl builtins are limited to the character set of the ACP.

      For Windows for the American market, the system's ACP is 1252.

        Perl builtins use the (A)NSI version of API functions
        Thank you! This is some how inspiring. Now I can do this :
        use Win32; use Storable; my $file = Win32::GetShortPathName('X:\Some\Unicode\Path\\'); # it giv +es me a 8.3 path print -e $file; # Got it! store {}, "$file/Storable.sto"; # Done!
        The only draw back for now is the path/file must existing, so that the FS can "assign" the 8.3 location.
      sorry for my poor English/terminology , I mean the path can be dealing like in ANSI code page
Re: En/Decode a unicode path
by harangzsolt33 (Chaplain) on Jan 22, 2023 at 23:10 UTC
    Working with Unicode file names in Perl is a pain. I am actually still more comfortable programming in JavaScript than Perl, so if I have to read a directory list, I write a JavaScript program in Windows. Unfortunately, you run into limitations in JavaScript when it comes to reading/writing large binary files, so there is no perfect language (or at least I don't happen to know). lol