If you want to use tr with dynamic strings (which is NOT the case here), you need to use string eval. Be sure to only use it for validated strings, never a random user input!
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use utf8;
use open OUT => ':encoding(UTF-8)', ':std';
my $charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
my $boldset = '𝐚𝐛𝐜𝐝𝐞𝐟𝐠𝐡𝐢𝐣𝐤𝐥𝐦𝐧𝐨𝐩𝐪𝐫𝐬𝐭𝐮𝐯𝐰𝐱𝐲𝐳𝐀𝐁𝐂𝐃𝐄𝐅𝐆𝐇𝐈𝐉𝐊𝐋𝐌𝐍𝐎𝐏𝐐𝐑𝐒𝐓𝐔𝐕𝐖𝐗𝐘𝐙𝟎𝟏𝟐𝟑𝟒𝟓𝟔𝟕𝟖𝟗';
my $source = 'The quick brown fox jumps over the lazy dog 1234567890 times.';
my $target = $source;
eval "\$target =~ tr/$charset/$boldset/";
say for $source, $target;
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
> Be sure to only use it for validated strings, never a random user input!
Here a generic routine to escape only selected meta-characters.
Escaping any / (or other delimiter) from input should allow to safely apply
eval "\$target =~ tr/$charset/$boldset/";
use v5.12;
use warnings;
use Data::Dump qw/pp dd/;
use Test::More;
sub escape_metas {
my ( $meta,$e ) = @_ ;
$e //= '\\'; # default backslash
my $ee ="\Q$e"; # don't mess my regex
s[ (?|
$ee($ee) # ignore double escapes
|
$ee($meta) # keep single escapes
|
($meta) # escape meta
)
]
[$e$1]xgr;
}
my $e = '\\'; # escape code
my $m = '/'; # to be escaped
for ("$m", "$e$e$m", "$e$e$e$e$m" ) {
my $got = escape_metas($m,$e);
is( $got, "$e$_" , "escaping $_ -> $got");
}
for ("$e$m", "$e$e$e$m" ) {
my $got = escape_metas($m,$e);
is( $got, $_ , "ignoring $_ eq $got");
}
done_testing;
C:/Strawberry/perl/bin\perl.exe -w d:/tmp/pm/escapism.pl
ok 1 - escaping / -> \/
ok 2 - escaping \\/ -> \\\/
ok 3 - escaping \\\\/ -> \\\\\/
ok 4 - ignoring \/ eq \/
ok 5 - ignoring \\\/ eq \\\/
1..5
Please tell me if I missed a case, tried to write it as generic as possible.
EDIT
More or betters tests are welcome too. =)
| [reply] [d/l] [select] |
I'm probably too busy today to understand. We wanted to escape the strings so they can be used in a transliteration, right? Why not test it directly, then?
sub use_it {
my ($string, $search, $replace) = @_;
my ($s, $r);
$s = escape_metas('/', '\\') for $search;
$r = escape_metas('/', '\\') for $replace;
return eval "\$string =~ tr/$s/$r/r"
}
sub cheat {
my ($string, $search, $replace) = @_;
return eval "\$string =~ tr|\Q$search\E|\Q$replace\E|r"
}
sub simulate {
my ($string, $search, $replace) = @_;
my $result = $string;
for my $i (0 .. length($search) - 1) {
my $from = substr $search, $i, 1;
my $to = substr $replace, $i, 1;
$result =~ s/\Q$from/$to/g;
}
return $result
}
for my $case (
# String search replace expect
['a/b' => 'a/b', 'xyz', 'xyz'],
['a\\b' => 'a\\b', 'xyz', 'xyz'],
['a/b' => '\\/', 'xy', 'ayb'],
['a\\/b' => '\\/', 'xy', 'axyb'],
['a/\\b' => '\\/', 'xy', 'ayxb'],
['a\\\\b' => '\\/', 'xy', 'axxb'],
['a\\\\/b' => '\\/', 'xy', 'axxyb'],
) {
is simulate(@$case), $case->[-1], 'simulate';
is cheat(@$case), simulate(@$case), 'cheat';
is use_it(@$case), simulate(@$case), 'use';
}
I'm not sure I got the "expect" right, but both "simulate" and "cheat" give the same results. "use", on the other hand, doesn't. I based it on your escape_metas - what did I do wrong?
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
From the documentation of tr:
Characters may be literals, or (if the delimiters aren't single quotes) any of the escape sequences accepted in double-quoted strings. But there is never any variable interpolation, so "$" and "@" are always treated as literals.
| [reply] |
use strict;
use warnings;
use Encode;
binmode *STDOUT, 'utf8'; # Suppress "wide character" warnings
my $CharSet = 'a'; # ASCII
my $BoldSet = pack('U', 119834); # Unicode bold 'a'
my $Source = 'a';
my $trTarget = $Source;
my $reTarget = $Source;
$trTarget =~ tr/$CharSet/$BoldSet/;
$reTarget =~ s/$CharSet/$BoldSet/;
print "$Source\n$trTarget\n$reTarget\n";
print $BoldSet;
Prints:
a
l
𝐚
𝐚
It seems tr/// isn't the right tool for the job. :-(
Update: PerlMonks is screwing up the unicode characters. They render correctly when I paste them into the edit window, but are shown as code points when I submit the edit. Bugger.
Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
| [reply] [d/l] [select] |
G'day Perlian,
Here's a generic technique for dealing with this type of problem which doesn't require listing every character.
$ perl -Mutf8 -C -E '
my ($offset_0, $offset_A, $offset_a)
= (ord("𝟎")-ord("0"), ord("𝐀")-ord("A"), ord("𝐚")-ord("a"));
say "The quick brown fox jumps over the lazy dog 1234567890 times."
=~ s/([0-9])/chr(ord($1)+$offset_0)/egr
=~ s/([A-Z])/chr(ord($1)+$offset_A)/egr
=~ s/([a-z])/chr(ord($1)+$offset_a)/egr;
'
𝐓𝐡𝐞 𝐪𝐮𝐢𝐜𝐤 𝐛𝐫𝐨𝐰𝐧 𝐟𝐨𝐱 𝐣𝐮𝐦𝐩𝐬 𝐨𝐯𝐞𝐫 𝐭𝐡𝐞 𝐥𝐚𝐳𝐲 𝐝𝐨𝐠 𝟏𝟐𝟑𝟒𝟓𝟔𝟕𝟖𝟗𝟎 𝐭𝐢𝐦𝐞𝐬.
This should work fine with your 5.26.3 (I'm using 5.32.0).
As general information: say requires 5.10 and /r requires 5.14.
Two caveats:
-
Different Perl versions support different Unicode® versions:
check you have a sufficiently high version of Perl to handle the Unicode characters you want to output
(if in doubt, check the deltas).
-
Some alphabetical sequences in [PDF]
"Mathematical Alphanumeric Symbols"
have missing characters because they were defined in earlier versions.
The first example in that block is U+1D44E (𝑎) to U+1D467 (𝑧)
which has U+1D455 (<reserved>) because U+210E (ℎ)
was already defined in [PDF]
"Letterlike Symbols" as PLANCK CONSTANT.
Here's another example to show the generality of the solution.
Only three characters were changed in the code to produce completely different output.
$ perl -Mutf8 -C -E '
my ($offset_0, $offset_A, $offset_a)
= (ord("𝟘")-ord("0"), ord("𝕬")-ord("A"), ord("𝖆")-ord("a"));
say "The quick brown fox jumps over the lazy dog 1234567890 times."
=~ s/([0-9])/chr(ord($1)+$offset_0)/egr
=~ s/([A-Z])/chr(ord($1)+$offset_A)/egr
=~ s/([a-z])/chr(ord($1)+$offset_a)/egr;
'
𝕿𝖍𝖊 𝖖𝖚𝖎𝖈𝖐 𝖇𝖗𝖔𝖜𝖓 𝖋𝖔𝖝 𝖏𝖚𝖒𝖕𝖘 𝖔𝖛𝖊𝖗 𝖙𝖍𝖊 𝖑𝖆𝖟𝖞 𝖉𝖔𝖌 𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡𝟘 𝖙𝖎𝖒𝖊𝖘.
| [reply] [d/l] [select] |
Thank you very much for all your answers, @choroba had the correct point: tr takes only literals for both character sets.
Yes there are ways around that by using the `evil' eval, but that is just not necessary in my case:
I just want to write a little function that accepts an ASCII string and returns a "bold" version of it.
And yes, my terminal (MobaXterm) is capable to display a pretty good chunk of the UniCode charset, including the pseudo-bold or -italic block.
Again, thank you all for guiding me back to the path of truth! 😋
Best regards from Charleston (WV),
Perlian | [reply] |
use utf8;
use Encode qw(encode decode);
my $CharSet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ012
+3456789'; # ASCII
my $BoldSet = encode('utf8','𝐚𝐛𝐜𝐝w
+838;𝐟𝐠𝐡𝐢𝐣𝐤𝐥&#
+119846;𝐧𝐨𝐩𝐪𝐫𝐬𝐭
+;𝐮𝐯𝐰𝐱𝐲𝐳𝐀w
+809;𝐂𝐃𝐄𝐅𝐆𝐇𝐈&#
+119817;𝐊𝐋𝐌𝐍𝐎𝐏𝐐
+;𝐑𝐒𝐓𝐔𝐕𝐖𝐗w
+832;𝐙𝟎𝟏𝟐𝟑𝟒𝟓&#
+120788;𝟕𝟖𝟗');
my $Source = 'The quick brown fox jumps over the lazy dog 1234567890 t
+imes.';
my $Target = $Source;
$Target =~ tr/abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123
+456789/𝐚𝐛𝐜𝐝𝐞𝐟𝐠
+;𝐡𝐢𝐣𝐤𝐥𝐦𝐧w
+848;𝐩𝐪𝐫𝐬𝐭𝐮𝐯&#
+119856;𝐱𝐲𝐳𝐀𝐁𝐂𝐃
+;𝐄𝐅𝐆𝐇𝐈𝐉𝐊w
+819;𝐌𝐍𝐎𝐏𝐐𝐑𝐒&#
+119827;𝐔𝐕𝐖𝐗𝐘𝐙𝟎
+;𝟏𝟐𝟑𝟒𝟓𝟔𝟕x
+790;𝟗/;
print "$Source\n$Target\n";
#The quick brown fox jumps over the lazy dog 1234567890 times.
#𝐓𝐡𝐞 𝐪𝐮𝐢𝐜w
+844; 𝐛𝐫𝐨𝐰𝐧 𝐟𝐨
+𝐱 𝐣𝐮𝐦𝐩𝐬 𝐨
+9855;𝐞𝐫 𝐭𝐡𝐞 𝐥𝐚
+;𝐳𝐲 𝐝𝐨𝐠 𝟏𝟐
+20785;𝟒𝟓𝟔𝟕𝟖𝟗𝟎
+ 𝐭𝐢𝐦𝐞𝐬.
| [reply] [d/l] |