Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re^2: Removing unwanted chars from filename.

by kcott (Archbishop)
on Oct 06, 2022 at 22:40 UTC ( [id://11147278]=note: print w/replies, xml ) Need Help??

in reply to Re: Removing unwanted chars from filename.
in thread Removing unwanted chars from filename.

G'day haukex,

"... (Update: and though tr/A-Za-z0-9._-//cd should be faster, the above module handles Unicode well, so that's why I'd still recommend that)

I wasn't aware that transliteration would have a problem with Unicode. Here's a quick test:

$ perl -Mutf8 -E '
    my $s = " abc \t ︎ αβ гдж سشص ᚠᚢᚸ ⎈ ☂  .png";
    $s =~ tr/A-Za-z0-9._-//cd;
    say $s;

I'm using Perl v5.36; are there issues with earlier versions?

I tested with a fair selection of Unicode characters but, obviously, I can't reasonably test them all. Are there problems with Unicode characters I didn't test?

— Ken

Replies are listed 'Best First'.
Re^3: Removing unwanted chars from filename.
by haukex (Archbishop) on Oct 07, 2022 at 06:02 UTC

    I was referring to the fact that the tr simply clobbers all Unicode characters, while Text::CleanFragment uses Text::Unidecode to try to turn them into ASCII:

    use warnings;
    use strict;
    use utf8;
    use Text::CleanFragment;
    my $s = "Hello.txt";
    print clean_fragment($s), "\n";  # prints "Hello.txt"
    $s =~ tr/A-Za-z0-9._-//cd;
    print "<$s>\n";  # prints "<>" !

    (I've actually encountered filenames similar to the above in the wild)

      Thanks for the clarification.

      — Ken

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11147278]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-06-14 21:30 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.