Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Unicode file names

by kcott (Archbishop)
on Jan 04, 2023 at 04:48 UTC ( [id://11149354]=note: print w/replies, xml ) Need Help??


in reply to Unicode file names

G'day BernieC,

The character that you show as ️ is "U+FE0F VARIATION SELECTOR-16"; it indicates that the preceding emoji character should be rendered in its graphical form. Its complement is "U+FE0E VARIATION SELECTOR-15"; it indicates that the preceding emoji character should be rendered in its textual form. See Unicode PDF code chart: "Variation Selectors Range: FE00–FE0F".

The character that you show as ✈ is "U+2708 AIRPLANE". As part of the demo code below, I've also used "U+2709 ENVELOPE". Find both of those in Unicode PDF code chart: "Dingbats Range: 2700–27BF".

The following two short scripts: create some files with and without Unicode characters in their filenames; identify the filenames with Unicode characters and rename them.

First, create the files for the demo.

C:\Users\ken\tmp\pm_11149351_unicode_filenames>dir Volume in drive C is Primary Drive Volume Serial Number is 5A0C-01CD Directory of C:\Users\ken\tmp\pm_11149351_unicode_filenames 04-Jan-23 15:06 <DIR> . 04-Jan-23 15:06 <DIR> .. 04-Jan-23 14:18 337 mkfiles.pl 04-Jan-23 15:03 271 mvfiles.pl 2 File(s) 608 bytes 2 Dir(s) 1,533,002,100,736 bytes free C:\Users\ken\tmp\pm_11149351_unicode_filenames>more mkfiles.pl #!perl use strict; use warnings; use autodie; my $emoji_airplane = "\x{2708}\x{FE0F}"; my $emoji_envelope = "\x{2709}\x{FE0F}"; my @fnames = ( 'AIR_2708_FE0F', "___ $emoji_airplane $emoji_airplane", 'ENV_2709_FE0F', "___ $emoji_envelope $emoji_envelope", ); for my $fname (@fnames) { open my $fh, '>', $fname; } C:\Users\ken\tmp\pm_11149351_unicode_filenames>perl mkfiles.pl

Use <pre> block to show Unicode characters:

C:\Users\ken\tmp\pm_11149351_unicode_filenames>dir
 Volume in drive C is Primary Drive
 Volume Serial Number is 5A0C-01CD

 Directory of C:\Users\ken\tmp\pm_11149351_unicode_filenames

04-Jan-23  15:32    <DIR>          .
04-Jan-23  15:32    <DIR>          ..
04-Jan-23  15:32                 0 AIR_2708_FE0F
04-Jan-23  15:32                 0 ENV_2709_FE0F
04-Jan-23  14:18               337 mkfiles.pl
04-Jan-23  15:03               271 mvfiles.pl
04-Jan-23  15:32                 0 ___ âœˆï¸ âœˆï¸
04-Jan-23  15:32                 0 ___ âœ‰ï¸ âœ‰ï¸
               6 File(s)            608 bytes
               2 Dir(s)  1,533,000,298,496 bytes free

C:\Users\ken\tmp\pm_11149351_unicode_filenames>

Now rename the filenames with Unicode characters.

C:\Users\ken\tmp\pm_11149351_unicode_filenames>more mvfiles.pl #!perl use strict; use warnings; use autodie; use File::Copy 'move'; opendir(my $dh, '.'); for my $fname (readdir $dh) { next if $fname =~ /^[\x00-\x7f]+$/; (my $new_name = $fname) =~ s/([^\x00-\x7f])/'+U' . ord($1) . 'U+'/ +eg; move($fname, $new_name); } C:\Users\ken\tmp\pm_11149351_unicode_filenames>perl mvfiles.pl C:\Users\ken\tmp\pm_11149351_unicode_filenames>dir Volume in drive C is Primary Drive Volume Serial Number is 5A0C-01CD Directory of C:\Users\ken\tmp\pm_11149351_unicode_filenames 04-Jan-23 15:35 <DIR> . 04-Jan-23 15:35 <DIR> .. 04-Jan-23 15:32 0 AIR_2708_FE0F 04-Jan-23 15:32 0 ENV_2709_FE0F 04-Jan-23 14:18 337 mkfiles.pl 04-Jan-23 15:03 271 mvfiles.pl 04-Jan-23 15:32 0 ___ +U226U++U156U++U136U++U239U++U1 +84U++U143U+ +U226U++U156U++U136U++U239U++U184U++U143U+ 04-Jan-23 15:32 0 ___ +U226U++U156U++U137U++U239U++U1 +84U++U143U+ +U226U++U156U++U137U++U239U++U184U++U143U+ 6 File(s) 608 bytes 2 Dir(s) 1,532,999,979,008 bytes free

So, that's very much skeleton code to demonstate a technique. You may want to offer an option to type a new filename; you may want something other than the default character conversion to "+U...U+". Perhaps you need to perform this recursively through a directory hierarchy. Depending on how you alter this to suit your preferences, validation, exception handling, and similar checks may be appropriate.

The ball's in your court. Take it from here ...

— Ken

Replies are listed 'Best First'.
Re^2: Unicode file names
by BernieC (Pilgrim) on Jan 04, 2023 at 12:21 UTC
    Looks good.. thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149354]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2025-11-17 15:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your view on AI coding assistants?





    Results (72 votes). Check out past polls.

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.