Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

help needed in opendir

by uva (Sexton)
on Mar 13, 2006 at 06:24 UTC ( [id://536198]=perlquestion: print w/replies, xml ) Need Help??

uva has asked for the wisdom of the Perl Monks concerning the following question:

dear monk, can u help me in opening the directory in utf8 format? is it possible to open directory handle with utf8 format? please help me. actually i tried with  opendir DIR,"<:utf8","C:\\direcory"; but it is giving error as too many parameters. help me how to open ..

Replies are listed 'Best First'.
Re: help needed in opendir
by davido (Cardinal) on Mar 13, 2006 at 06:47 UTC

    is it possible to open directory handle with utf8 format?

    I think the answer is "No, not really, natively." I base this (possibly wrong) conclusion on the following paragraph, found in perlunicode:

    When Unicode Does Not Happen

    While Perl does have extensive ways to input and output in Unicode, and few other 'entry points' like the @ARGV which can be interpreted as Unicode (UTF-8), there still are many places where Unicode (in some encoding or another) could be given as arguments or received as results, or both, but it is not.

    The following are such interfaces. For all of these interfaces Perl currently (as of 5.8.3) simply assumes byte strings both as arguments and results, or UTF-8 strings if the encoding pragma has been used.

    One reason why Perl does not attempt to resolve the role of Unicode in this cases is that the answers are highly dependent on the operating system and the file system(s). For example, whether filenames can be in Unicode, and in exactly what kind of encoding, is not exactly a portable concept. Similarly for the qx and system: how well will the 'command line interface' (and which of them?) handle Unicode?

    • chdir, chmod, chown, chroot, exec, link, lstat, mkdir, rename, rmdir, stat, symlink, truncate, unlink, utime, -X
    • %ENV
    • glob (aka the <*>)
    • open, opendir, sysopen
    • qx (aka the backtick operator), system
    • readdir, readlink

    Addition:

    See also Perl's documentation for the 'open' pragma. This pragma allows you to set default layers for input and output (ie, character encoding support for IO). There is one sentence which is telling:

    Directory handles may also support PerlIO layers in the future.

    ...which means, that at least as of 5.8.8, directory handles do not support PerlIO layers.


    Dave

Re: help needed in opendir
by Marza (Vicar) on Mar 13, 2006 at 06:51 UTC

    Hmmm sounds like homework?

    I would suggest you read up on opendir and take a look at utf8 on cpan.

    -edit- But for the sake of fun; you could experiment with *not tested*

    #! /usr/bin/perl use strict; use warnings; use bytes; opendir DIR, "/path/to/dir" or die "opendir: $!"; my @files = readdir DIR; open HANDLE, ">filelist" or die "open filelist: $!"; foreach (@files) { print HANDLE "$_\n"; }

    Get it working and then use the use utf8; and compare the results.

      i actually created directory with chinese characters , i tried to read that chinese directory. and also i tried the above solution it is just printing "???" for that chinese directory . how to read that chinese directory.
        Works for me.
        ~/536198> ls 536198.pl 中文 ~/536198> cat 536198.pl #!/usr/bin/perl use strict; use diagnostics; opendir my $dirhandle, "." or die "could not opendir current dir: $!"; print join "\n", readdir $dirhandle; closedir $dirhandle; ~/536198> perl 536198.pl . .. 中文 536198.pl
Re: help needed in opendir
by jonadab (Parson) on Mar 13, 2006 at 13:19 UTC

    Yeesh, I thought it was bad enough that filenames are allowed to contain spaces, shell metacharacters, and punctuation. Now they can have high-bit and even multibyte characters as well? What next, paragraph breaks, different fonts and sizes, and bold, italic, and underlined characters? Please, can we embed tables, frames, and images into our filenames? How about OLE objects and active scripts? I want to put AJAX code into the filename of an icon on my desktop so it can show me an RSS feed...

    Bah.

    The only sane solution to this nonsense is to split the role of the file identifier, which is used by software accessing the file, apart from the file description, which is shown to end users. Having the same filename fill both roles is nothing but trouble.

      How dashed inconsiderate of johnney foriegner to want name files using esoteric, descriptive terms like:

      • 数.文
      • デ.タファイル
      • 자료.파일
      • архив.данных
      • αρχείο.στοιχείω

      Why can't they make do with something simple like data.file


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        How dashed inconsiderate of johnney foriegner to want name files using esoteric, descriptive terms

        That was exactly my point: the problem arises because software has to identify the files with the same filenames that the users have to use. The demands on what can be put in a filename are not going to stop with relatively sane things like a larger character set. The filename, because it is used directly by users as an expression of the file's contents, will eventually be expected to be able to include, among other things, symbols that are not included in Unicode, including custom symbols that the user just made up and drew. Users will want files to be represented also with animations. And on it goes. All of that would be relatively reasonable, if it were just the user interface. On the Mac, ordinary data files can have their own icons attached to them, but that is pretty limited, because you've only got so many pixels to play with. On MS Windows and most Unices, you don't even have that.

        The problem with putting arbitrary things in the filename is *not* a problem of allowing the user to describe the file with arbitrary information. The problem is that the filename is not just the user-side file description medium: it's also the program's interface. That means every time any ability gets added to the interface, every single application -- even behind-the-scenes apps like the ones in server space -- has to be reworked to support it. Unicode characters are only the beginning.

        Much of this pain could be avoided if the representation shown in the file manager and the file selection dialog boxes were separate from the file identifier used by programs to identify the file. Presumably many command-line users would probably choose to specify files by their identifiers rather than by their representation, because the identifier would probably usually be easier to type, but there's no reason they couldn't be given the option to specify them by the representation if they so choose, if it's something they can find a way to type somehow. But why should GUI users be limited to only using file representations that can be typed? That would be an aweful lot like insisting that the Chinese stick to filenames that only contain ASCII characters. Either way, you're imposing a limitation on users because of an implementation detail that is unimportant to the users -- and when the users decide they're not willing to put up with it anymore, then you make chaos for all the programmers as they have to fix all the software to stop making assumptions about how file identifiers are structured.

        Allowing more things to be put in filenames isn't going to solve the problem. Allowing spaces to be put in filenames didn't solve the problem; allowing Unicode characters is just more of the same band-aid. The only real solution is to separate the concept of a file identifier, which programs use to identify the file, from the concept of a representation that the user specifies and uses to keep track of what the file is about. Having the two be the same stopped making sense when people who weren't programmers started using computers.


        Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://536198]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2025-01-21 22:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (62 votes). Check out past polls.