Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Perl / FileFind or ...

by Festus Hagen (Novice)
on Nov 27, 2012 at 20:31 UTC ( #1005912=perlquestion: print w/ replies, xml ) Need Help??
Festus Hagen has asked for the wisdom of the Perl Monks concerning the following question:

Hiya all,

Poor title excuse: I'm completely flabbergasted by this ...

Such a simple thing ... ??

#!/usr/bin/perl use strict; use warnings; # ** # * It is what it is, you can do with it as you please. [with respect +, leave the credits] # * # * Just don't blame me if it teaches your computer to smoke! # * # * -Enjoy # * fh :)_~ # ** use File::Find; my $Directory = 'C:/Tmp'; my @flist; sub cbFileFind { print $File::Find::name, "\n"; } find(\&cbFileFind, $Directory);

What's up here??

C:\Documents and Settings\fh\My Documents\Scripts\Perl\Audio\CleanFile +s>example.pl C:/Tmp C:/Tmp/M÷tley Crⁿe C:/Tmp/M÷tley Crⁿe/Dr. Feelgood (Bonus Track Version) C:/Tmp/M÷tley Crⁿe/Dr. Feelgood (Bonus Track Version)/07 - Same +Ol' Situation (S.O.S).mp3 C:/Tmp/M÷tley Crⁿe/Saints of Los Angeles C:/Tmp/M÷tley Crⁿe/Saints of Los Angeles/05 - Saints of Los Ange +les (Gang Vocal).mp3
C:\Documents and Settings\fh\My Documents\Scripts\Perl\Audio\CleanFile +s>dir /s /b C:\Tmp C:\Tmp\Mötley Crüe C:\Tmp\Mötley Crüe\Dr. Feelgood (Bonus Track Version) C:\Tmp\Mötley Crüe\Saints of Los Angeles C:\Tmp\Mötley Crüe\Dr. Feelgood (Bonus Track Version)\07 - Same Ol' Si +tuation (S.O.S).mp3 C:\Tmp\Mötley Crüe\Saints of Los Angeles\05 - Saints of Los Angeles (G +ang Vocal).mp3 C:\Documents and Settings\fh\My Documents\Scripts\Perl\Audio\CleanFile +s>

-Enjoy
fh : )_~

Comment on Perl / FileFind or ...
Select or Download Code
Re: Perl / FileFind or ...
by Anonymous Monk on Nov 27, 2012 at 20:37 UTC
Re: Perl / FileFind or ...
by blue_cowdawg (Monsignor) on Nov 27, 2012 at 20:46 UTC

    As the mystery monk implies: what's the question?

    Looks like it is working as designed...


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: Perl / FileFind or ...
by Festus Hagen (Novice) on Nov 27, 2012 at 21:10 UTC
    Y'all are kidding right ??

    -Enjoy
    fh : )_~

      Y'all are kidding right ??

      No, are you kidding?

      Between the perlmonks latin-1 limitation , the variability of win32 filesystuems (fat/ntfs/...), and whatever you're dealing with, I don't know what you're complaining about.

      It is either what you see in the console, in which case binmode something, Text::Unidecode ... whatever you want

      Or the problem is the ANSI filenames you get on win32( When Unicode Does Not Happen ), in which case you need GetLongPathName or Win32::Unicode::Native

      I know what I mean. Why don't you?, How do I post a question effectively?

      When you're asked for clarification, it probably isn't a joke.

Re: Perl / FileFind or ...
by TomDLux (Vicar) on Nov 27, 2012 at 21:20 UTC

    I think you're complaining about getting the divide symbol, or '#8319;' instead of accented characters.

    Try utf8 instead of USASCII.

    As Occam said: Entia non sunt multiplicanda praeter necessitatem.

Re: Perl / FileFind or ...
by runrig (Abbot) on Nov 27, 2012 at 21:46 UTC
    You were expecting maybe:
    C:/Tmp/Justin Bieber C:/Tmp/The Archies C:/Tmp/Debby Boone
    ???

      find / -name "*Bieber*" -exec rm -rf {} \;

      -Enjoy
      fh : )_~

        You destroyed my research on the bieberite mineral =(

Re: Perl / FileFind or ...
by Festus Hagen (Novice) on Nov 27, 2012 at 21:56 UTC
    Yea Tom, that be exactly the issue.

    Right or wrong, I have tried utf8, unicode and many other things found while searching, to no avail.

    Guess I just don't get it, Why such a simple thing is so difficult.

    -Enjoy
    fh : )_~

      Yea Tom, that be exactly the issue.

      Since PerlMonks offers threaded discussions, it is important to reply to the correct node by clicking the [reply] alongside the node of interest

      Guess I just don't get it, Why such a simple thing is so difficult.

      Decode the input, Encode the output, read perlunitut: Unicode in Perl#I/O flow (the actual 5 minute tutorial) and learn about your shell

      $ chcp Active code page: 437 $ echo > "da-MötleyCrüe" $ dir /b "da-*" da-MötleyCrüe $ dir /b "da-*" | perl -MData::Dump -e " dd[<>] " ["da-M\x94tleyCr\x81e\n"] $ perl -MData::Dump -e " dd[ glob q/da-*/ ] " ["da-M\xF6tleyCr\xFCe"]

      Single byte encoding can be hard to guess

      $ perl -MEncode::Detective=detect -le " die detect( glob q/da-*/ ) " windows-1252 at -e line 1. $ perl -MEncode::Guess -e " die guess_encoding( glob q/da-*/ ) " No appropriate encodings found! at -e line 1. $ dir /b "da-*" | perl -MEncode::Detective=detect -e " $f = <>; die de +tect($f ) " Died at -e line 1, <> line 1. $ dir /b "da-*" | perl -MEncode::Guess -e " $f = <>; die guess_encodin +g($f ) " No appropriate encodings found! at -e line 1, <> line 1. $ dir /b "da-*" | perl -MEncode::Guess -e " $f = <>; die guess_encodin +g($f , q/cp437/) " Encode::XS=SCALAR(0x9a622c) $ dir /b "da-*" | perl -MEncode::Guess -e " $f = <>; die guess_encodin +g($f , q/cp437/)->name " cp437 at -e line 1, <> line 1.

      But once you know, just binmode
      $ perl -le " print for glob q/da-*/ "
      da-M÷tleyCrⁿe

      $ perl -le " binmode STDOUT , q/:encoding(cp437)/; print for glob q/da-*/ "
      da-MötleyCrüe

      $ perl -Mopen=:std,encoding(cp437) -le " print for glob q/da-*/ "
      da-MötleyCrüe

      $ perl -MEncode::Locale -le " binmode STDOUT, q{encoding(console_out)}; print for glob q/da-*/ "
      da-MötleyCrüe

Re: Perl / FileFind or ...
by graff (Chancellor) on Nov 28, 2012 at 04:53 UTC
    What makes you think that handling non-ASCII characters in path/file names should be simple? I suppose that if you have intimate knowledge about the OS you're using, and about the file system installed on the specific disk volume you're using, and about the capabilities of the particular terminal/browser/other application that is trying to display file name strings on your monitor, and about the environment/configuration settings that control the behavior of that application, and about the process(es) that created the file names on that specific disk volume in the first place, then you might know enough for the handling of non-ASCII file names to seem "simple."

    But if you lack intimate knowledge on any of those topics, your first resort should be to get a hex-dump view of the byte sequences being used in any given file name string. That way, all you need is a general knowledge of the possible non-ASCII character encodings, and perhaps some presupposition about the (human) language being used by the person who assigned the file name (or at least, some sense of the alphabet being used - Cyrillic? Greek? Latin? Arabic? ... - including the range of diacritic marks, odd-ball punctuation and/or special symbols that are likely to show up). Not that this in itself is "simple", but at least there are fewer moving parts.

    Obviously, getting a hex-dump style output just gets in the way when file paths contain nothing outside the printable ASCII range, so a useful elaboration of your File::Find callback might go something like this:

    sub cbFileFind { my $printable_name = $File::Find::name; $printable_name =~ s/([^ -~])/sprintf("\\x{%02x}",ord($1))/eg; print $printable_name, "\n"; }
    If you happen to already know (or if the approach just shown makes it clear) what the particular character encoding is for the non-ASCII portions of your file names, you can use Encode to convert (decode) the strings as read from the file system into perl-internal (utf8) encoding, and then the "ord()" function will return unicode code-point numbers. which you can look up in case the particular characters are unfamiliar to you (check out Re: Regular expressions and accents and tlu -- TransLiterate Unicode).
Re: Perl / FileFind or ...
by Festus Hagen (Novice) on Nov 28, 2012 at 15:30 UTC
    First, Thanks to Anonymous Monk for an excellent and informative post.

    Simple ... Yea, it should be!

    Why?
    Because it's a high level language (or supposed to be), And it should be smart enough to handle basic OS configuration.
    All Perl has to do is ask the OS and set itself accordingly!

    As is pointed out in this thread.

    Now if it was a string created from user data, that would be a different story.

    It's not, it's OS data ... The OS knows what it is, Perl should as well!

    -Enjoy
    fh : )_~

      There is no standard way to communicate the encoding of a file system.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Know what?

      Your comments are very confused , esp because you're [reply]ing to yourself, again

      If I assume you're talking about the console code page, perl doesn't assume you're writing a terminal program

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1005912]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2014-11-23 01:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (127 votes), past polls