Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Writing UTF8 Filename

by amiribarksdale (Acolyte)
on Nov 16, 2007 at 22:31 UTC ( #651331=perlquestion: print w/ replies, xml ) Need Help??
amiribarksdale has asked for the wisdom of the Perl Monks concerning the following question:

Hey folks: I am trying to write a utf8 filename with mv, and I don't seem to be able to do it. Here is the command: system( "/bin/mv", "/tmp/$stitle", "home/web/htdocs/music/$musicdir/$title"); $title is a utf8 string with diacritical marks. Any tips? Amiri

Comment on Writing UTF8 Filename
Re: Writing UTF8 Filename
by Juerd (Abbot) on Nov 16, 2007 at 22:44 UTC

    Have you properly encoded $stitle, $musicdir, and $title?

    Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

      Well, I am unsure about what you mean by properly encoding them. The variables write fine into MySQL and into the web pages I am producing. Just not to the actual file. For instance, further above in the subroutine, I utf_escape($title), because I do some processing on the file before I save it, and the program I use doesn't like funny names. But I figured there should be no problem writing the filename, because my filesystem can certainly handle utf8. But perl/mv seem not to be able to write it directly. Amiri

        Well, I am unsure about what you mean by properly encoding them.

        Then you very probably haven't. Please read the Perl Unicode Tutorial at http://tnx.nl/perlunitut and update your program. Perl's unicode support goes much deeper than just allowing byte strings, and typically means you have to change your input and output routines.

        Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

Re: Writing UTF8 Filename
by graff (Chancellor) on Nov 17, 2007 at 00:05 UTC
    I am trying to write a utf8 filename with mv...

    Why? Let me suggest that unless you have a really, seriously, inescapably unavoidable and compulsory reason for doing this, you would be much better off not doing it. Just don't.

    The current state of support for non-ASCII characters in file names is not what I would call "stable" (or "sane" or "worth the hassle"). It is likely to vary in significant and perplexing ways across various (versions of) operating systems and file/text transfer protocols. Even on a single system where non-ASCII file names seem to "work", you are likely to discover a crippling amount of "variability" among various applications currently running on that system in terms of how (or whether) they deal with non-ASCII characters in file names.

    If you are just trying to spruce up the appearance of your music collection, use a database or XML structure that relates sensible (ASCII-only) file names to whatever sort of strings you want to see as the list of files.

    If you actually do believe there is an unavoidable, compulsory need for this, try to think of a work-around that involves using ASCII-only strings. If you can't... well, perhaps other replies here will help, but the solution may be OS dependent, and you might regret it later. Good luck.

    (BTW, you might find it easier to use the perl built-in function "rename" -- it saves you from worrying about what happens to non-ASCII data being passed as command-line args to a sub-shell.)

      (BTW, you might find it easier to use the perl built-in function "rename" -- it saves you from worrying about what happens to non-ASCII data being passed as command-line args to a sub-shell.)

      Yeah, instead of worrying about what happens if you do a system call, you now have to worry about what happens if you do a system call. The same thing happens: latin1 or utf8 encoding may be used, depending on the circumstances. Thus: encode explicitly.

      Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

        Thank you all for your responses.

        Juerd, I am wondering what you mean when you keep saying "encode explicitly":

        The unicode strings I am trying to set as the filename are encoded as utf8 from a web form. They go into the utf8-character-set encoded mysql database. They display properly on the web form. What step for encoding explicitly could I be missing? It seems to me that they begin life as utf8 and they stay that way. How do I get more explicit?

      The current state of support for non-ASCII characters in file names is not what I would call "stable" (or "sane" or "worth the hassle").
      Simply name it: it's non-existent.

      Maybe we'll have encoded filenames support in perl 5.12?

        Maybe we'll have encoded filenames support in perl 5.12?

        My "current state of support" comment was a reference to OS-level issues (on whatever OS). I would not expect such support from perl any time soon, given that there is no consistent form of OS support.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://651331]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2014-12-22 00:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (110 votes), past polls