Re: Removing unwanted chars from filename.
by haukex (Archbishop) on Oct 06, 2022 at 17:39 UTC
|
I would strongly recommend Corion's Text::CleanFragment.
As for your regex, note that [:ascii:] is defined as "Any character in the ASCII character set", and the string you've shown here is entirely ASCII, so your code is "working". Perhaps you meant s/[^[:alnum:]]//g or e.g. s/[^[:alnum:]._-]//g instead? (Update: and though tr/A-Za-z0-9._-//cd should be faster, the above module handles Unicode well, so that's why I'd still recommend that)
| [reply] [d/l] [select] |
|
$ perl -Mutf8 -E '
my $s = " abc \t ©︎ αβ гдж سشص ᚠᚢᚸ ⎈ ☂ .png";
$s =~ tr/A-Za-z0-9._-//cd;
say $s;
'
abc.png
I'm using Perl v5.36; are there issues with earlier versions?
I tested with a fair selection of Unicode characters but, obviously, I can't reasonably test them all.
Are there problems with Unicode characters I didn't test?
| [reply] [d/l] |
|
use warnings;
use strict;
use utf8;
use Text::CleanFragment;
my $s = "Hello.txt";
print clean_fragment($s), "\n"; # prints "Hello.txt"
$s =~ tr/A-Za-z0-9._-//cd;
print "<$s>\n"; # prints "<>" !
(I've actually encountered filenames similar to the above in the wild) | [reply] [d/l] |
|
Re: Removing unwanted chars from filename.
by hippo (Bishop) on Oct 06, 2022 at 18:40 UTC
|
If you are stripping out all characters from a known set then tr is the way to go for 2 reasons. Firstly, it's lightning fast. Secondly you cannot accidentally construct a pattern of more than a single character. Here is a test to demonstrate.
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More tests => 1;
my $in = q/xTest-1 [ ] 'copy'.png /;
my $want = 'xTest-1copy.png';
my $have = filter ($in);
is $have, $want;
sub filter {
my $str = shift or return '';
return $str =~ tr/A-Za-z0-9.-//cdr;
}
| [reply] [d/l] |
|
sub filter {
my $str = shift or return '';
return $str =~ tr/A-Za-z0-9.-//cdr;
}
The my $str = shift or return ''; statement will cause a file name of '0' to be converted to the empty string.
An alternative to avoid this problem is my ($str) = @_ or return '';
While such a file name seems unlikely to be encountered in the wild, it's best to be prepared. :)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
my $str = shift // '';
but the idea of doing the rest of the processing (however swift) against the empty string rankled so I opted for the short-circuit instead. Should have just left it well alone :-)
| [reply] [d/l] |
|
Re: Removing unwanted chars from filename.
by harangzsolt33 (Hermit) on Oct 08, 2022 at 00:40 UTC
|
But you want to include plain ASCII characters that are legal though. No?
$FILENAME =~ tr| A-Za-z0-9\!\$\#\%\&\^\`\@\_\-\+\=\~\.\,\;\(\)\[\]\{\}
+\/\\||cd;
| [reply] [d/l] |
|
No. Our anonymous friend clearly wants to remove all the whitespace and the square brackets, both of which your operation keeps.
What's the story with all those backslashes, BTW?
| [reply] |
|
| [reply] [d/l] |