Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Curious about why some characters cause issues with mkdir/print

by ikegami (Patriarch)
on Mar 19, 2018 at 17:50 UTC ( [id://1211253]=note: print w/replies, xml ) Need Help??


in reply to Curious about why some characters cause issues with mkdir/print

Perl operators that deal with paths suffer from The Unicode Bug. The path actually used is provided by the following sub:

sub path_actually_used { if (is_utf8($_[0]) { my $s = $_[0]; utf8::encode($s); return $s; } else { return $_[0]; } }

That means that if you have encoded bytes in an upgraded string, Perl will get it wrong.

my $s = chr(9734); mkdir($s); # ok utf8::encode($s); mkdir($s); # ok utf8::upgrade($s); mkdir($s); # not ok

It's virtually impossible to get into that situation without a bug in your code because the encoding functions always return a downgraded string.

You've already identified the solution:

  • If the path is a string of encoded text (i.e. UTF-8), passing it through utf8::downgrade($s) will ensure it's used correctly.
  • If the path is a string of decoded text (i.e. Unicode Code Points), encoding it (e.g. using utf8::encode($s)) will ensure it's used correctly.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1211253]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-24 12:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found