Beefy Boxes and Bandwidth Generously Provided by pair Networks httptech
Think about Loose Coupling
 
PerlMonks  

how to make a filename in unicode characters

by srikrishnan (Beadle)
on Jul 02, 2011 at 07:04 UTC ( #912442=perlquestion: print w/ replies, xml ) Need Help??
srikrishnan has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I want to create files with unicode filenames (non English characters e.g. Tamil or Hindi).

Is there a way in Perl?

FYI: I am using perl 5.12

Thanks,

Srikrishnan

Comment on how to make a filename in unicode characters
Re: how to make a filename in unicode characters
by Anonymous Monk on Jul 02, 2011 at 07:16 UTC
Re: how to make a filename in unicode characters
by Corion (Pope) on Jul 02, 2011 at 07:20 UTC

    There are "almost no" problems. What have you tried and where did you encounter problems?

    There is a problem with non-ascii filenames - Perl treats all filenames as opaque strings when it passes them to open. Depending on your combination of filesystem and operating system, you might get lucky and find that they also treat the filenames as opaque strings. Then you can just create files with any name you like and will find little problems out of the ordinary, when reading filenames from readdir or glob or text files.

    For example on Windows+NTFS, the situation is different. Windows with NTFS encodes non-ascii filenames as UTF-16LE, but Perl does not use the proper APIs (yet) to access such files by their given name. This means that you will encounter interesting problems there where readdir returns filenames that do not match up with the names you find in text files or with a hardcoded filename you give in the source code.

    Also see (found via site:perlmonks.org filename encoding)

      I have tried initially as per below code

      my $title = "​கோப்பு& +#8203;பெயர்"; open XXX, ">:encoding(UTF-8)", "c:/$title\.xml"; close (XXX);

      Then I tried something like below

      $title = encode("UTF-16LE", "​கோப&#3021 +;பு​பெயர்"); open XXX, ">:encoding(UTF-8)", "c:/$title\.xml"; close (XXX);

      In the above codes I am not used hexadecimal entities, straightaway characters, here it converts those characters to hexadecimal entitites

        Ah - so you need to tell Perl what encoding your source code is in - if you are certain that your source code is UTF-8, the utf8 pragma might help there. You should really check the return value of open by using either autodie or doing the following:

        open ... or die "Couldn't open '$filename': $!";

        Also, you're using the backslash ("\") as filename separator - I guess that you are on Windows then. See the reply by Anonymous Monk about using Win32::Unicode then.

Re: how to make a filename in unicode characters
by ikegami (Pope) on Jul 02, 2011 at 08:19 UTC

    Perl treats file names as opaque strings of bytes. That means that file names need to be encoded as per your "locale"'s encoding (ANSI code page).

    In Windows, code page 1252 is commonly used, and thus the encoding is usually cp1252.* However, cp1252 doesn't support Tamil and Hindi characters.

    Windows also provides a "Unicode" aka "Wide" interface, but Perl doesn't provide access to it using builtins**. You can use Win32API::File's CreateFileW, though. IIRC, you need to still need to encode the file name yourself. If so, you'd use UTF-16le as the encoding.

    Aforementioned Win32::Unicode appears to handle some of the dirty work of using Win32API::File for you. I'd also recommend starting with that.

    * — The code page is returned (as a number) by the GetACP system call. Prepend "cp" to get the encoding.

    ** — Perl's support for Windows sucks in some respects.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://912442]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2014-04-21 08:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (492 votes), past polls