Re: uparse - Parse Unicode strings
by Tux (Canon) on Nov 18, 2023 at 10:05 UTC
|
tux 🐧 uchar --help
usage: uchar -v [-m base:count [ -m base:count ] ...
uchar -v -f char ...
perl 5.38.0 with Unicode 15.0.0
-m show maps
-v verbosity
-l list GBA characters
-f find
-F find (only chars supported in current font)
-s splash all characters found into a single string
-k show matching key combo(s)
-d apply random diacricals
-e show character encodings (uchar -e -f u_BREVE)
-o also show octal version of encoding
-E show character decodings (uchar -E fc)
-b strip to base
-D show codepoints in decimal
-c copy found string(s) to clipboard
-h also show html entity if available
tux 🐧 uchar -v X🩼X
X U00058 \N{LATIN CAPITAL LETTER X}
🩼 U1fa7c \N{CRUTCH}
X U00058 \N{LATIN CAPITAL LETTER X}
tux 🐧 uchar -v U+1f427
🐧 U1f427 \N{PENGUIN}
tux 🐧 uchar -e U+1f427
🐧 U1f427 \N{PENGUIN}
cp1026 6f
cp1047 6f
cp37 6f
cp424 6f
cp500 6f
cp875 6f
gb12345-raw 22
gb2312-raw 22
hz 22
iso-2022-kr 1b2429435c787b31663432377d
iso-ir-165 22
jis0208-raw 20
jis0212-raw 22
ksc5601-raw 22
posix-bc 6f
UCS-2BE fffd
UCS-2LE fdff
UTF-16 feffd83ddc27
UTF-16BE d83ddc27
UTF-16LE 3dd827dc
UTF-32 0000feff0001f427
UTF-32BE 0001f427
UTF-32LE 27f40100
UTF-7 2b324433634a772d
utf-8-strict f09f90a7
utf8 f09f90a7
tux 🐧 uchar -E f09f90a7 | grep utf
utf-8-strict 🐧
utf8 🐧 (U+1F427)
tux 🐧 uchar -Fk "L WITH STROKE"
Searching for (?^u:\bL WITH STROKE\b)
000141 Ł LSTROKE_IDX LATIN CAPITAL LETTER L WITH STROKE
#<Multi_key> <L> <minus>
#<Multi_key> <minus> <L>
<Multi_key> <L> <slash>
<Multi_key> <L> <underscore>
<Multi_key> <slash> <L>
<Multi_key> <underscore> <L>
000142 ł lSTROKE_IDX LATIN SMALL LETTER L WITH STROKE
#<Multi_key> <l> <minus>
#<Multi_key> <minus> <l>
<Multi_key> <l> <slash>
<Multi_key> <l> <underscore>
<Multi_key> <slash> <l>
<Multi_key> <underscore> <l>
tux $ perl -CEO -wE'say "\x{1F468}\x{1F3FD}\x{200D}\x{2708}\x{FE0F}"'
👨🏽✈️
tux $ raku -e'"\x[1F468]\x[1F3FD]\x[200D]\x[2708]\x[FE0F]".say'
👨🏽✈️
tux $ raku -e'"\x[1F468]\x[1F3FD]\x[200D]\x[2708]\x[FE0F]".say' | xarg
+s uchar -v
👨 U1f468 \N{MAN}
🏽 U1f3fd \N{EMOJI MODIFIER FITZPATRICK TYPE-4}
U0200d \N{ZERO WIDTH JOINER}
✈ U02708 \N{AIRPLANE}
️ U0fe0f \N{VARIATION SELECTOR-16}
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
|
| [reply] |
|
Wow, very impressive! ... agree with kcott that it deserves its own CUFP page.
I played briefly with your command on Ubuntu using perl v5.38:
~/pm/Tux$ perl -CEO -wE'say "\x{1F468}\x{1F3FD}\x{200D}\x{2708}\x{FE0F}"'
👨🏽✈️
~/pm/Tux$ echo -e '\U1F468\U1F3FD\U200D\U2708\UFE0F'
👨🏽✈️
AFAICT, the output from the perl -CEO and the bash echo -e commands above is identical, namely:
👨🏽‍✈️
Running this command produced useful output (that seems to match yours), despite the error messages:
~/pm/Tux$ echo -e '\U1F468\U1F3FD\U200D\U2708\UFE0F' | xargs uchar -v
Can't exec "locate": No such file or directory at ~/pm/Tux/uchar line 103.
👨 U1f468 \N{MAN}
🏽 U1f3fd \N{EMOJI MODIFIER FITZPATRICK TYPE-4}
U0200d \N{ZERO WIDTH JOINER}
✈ U02708 \N{AIRPLANE}
️ U0fe0f \N{VARIATION SELECTOR-16}
Using CODE blocks intead of pre:
~/pm/Tux$ echo -e '\U1F468\U1F3FD\U200D\U2708\UFE0F' | xargs uchar -v
Can't exec "locate": No such file or directory at ~/pm/Tux/uchar line
+103.
👨 U1f468 \N{MAN}
🏽 U1f3fd \N{EMOJI MODIFIER FITZPATRICK TYPE-4}
‍ U0200d \N{ZERO WIDTH JOINER}
✈ U02708 \N{AIRPLANE}
️ U0fe0f \N{VARIATION SELECTOR-16}
| [reply] [d/l] [select] |
|
| [reply] |
|
|
|
|
|
Re: uparse - Parse Unicode strings
by ikegami (Patriarch) on Nov 19, 2023 at 04:43 UTC
|
$ unichars '\p{Emoji}' | wc -l
178
$ unichars '\p{Emoji}' | head -n 30 | tail -n 20
8 U+0038 DIGIT EIGHT
9 U+0039 DIGIT NINE
© U+00A9 COPYRIGHT SIGN
® U+00AE REGISTERED SIGN
‼ U+203C DOUBLE EXCLAMATION MARK
⁉ U+2049 EXCLAMATION QUESTION MARK
™ U+2122 TRADE MARK SIGN
ℹ U+2139 INFORMATION SOURCE
↔ U+2194 LEFT RIGHT ARROW
↕ U+2195 UP DOWN ARROW
↖ U+2196 NORTH WEST ARROW
↗ U+2197 NORTH EAST ARROW
↘ U+2198 SOUTH EAST ARROW
↙ U+2199 SOUTH WEST ARROW
↩ U+21A9 LEFTWARDS ARROW WITH HOOK
↪ U+21AA RIGHTWARDS ARROW WITH HOOK
⌚ U+231A WATCH
⌛ U+231B HOURGLASS
⌨ U+2328 KEYBOARD
⏏ U+23CF EJECT SYMBOL
$ uniprops 🧑
U+1F9D1 ‹🧑› \N{ADULT}
\pS \p{So}
All Any Assigned Common Zyyy EBase Emoji_Modifier_Base Emoji Emoji_Presentation EPres Extended_Pictographic ExtPict
So S Gr_Base Grapheme_Base Graph X_POSIX_Graph GrBase Other_Symbol Print X_POSIX_Print Symbol
Sup_Symbols_And_Pictographs Supplemental_Symbols_And_Pictographs InSupSymbolsAndPictographs Unicode
$ uniprops U+1F9D1
U+1F9D1 ‹🧑› \N{ADULT}
\pS \p{So}
All Any Assigned Common Zyyy EBase Emoji_Modifier_Base Emoji Emoji_Presentation EPres Extended_Pictographic ExtPict
So S Gr_Base Grapheme_Base Graph X_POSIX_Graph GrBase Other_Symbol Print X_POSIX_Print Symbol
Sup_Symbols_And_Pictographs Supplemental_Symbols_And_Pictographs InSupSymbolsAndPictographs Unicode
| [reply] [d/l] [select] |
|
| [reply] |
|
See also: unichars and uniprops from Unicode::Tussle
Thanks!
Finally got around to installing Unicode::Tussle on Ubuntu perl v5.38 and am
pleased to report all your examples worked fine for me, albeit with a
harmless looking
"charnames: some short character names may clash in [GREEK, LATIN], for example GAMMA"
warning written to stderr.
I'm now spoilt for choice, with three different working Unicode tools to choose from:
| [reply] [d/l] [select] |
Re: uparse - Parse Unicode strings
by eyepopslikeamosquito (Archbishop) on Nov 18, 2023 at 10:15 UTC
|
Brilliant work kcott!
Everything I've tested so far works like a charm on my Ubuntu Linux VM (running perl v5.38.0
built from source as described here).
A lot more convenient than the crude hack I was using,
namely to click on the little xml link on a post to
see the decimal values of the Unicode emojis.
For example, clicking on the xml link on your post now allows me to see:
... difficult to tell them apart; e.g. <tt>🧑</tt> & <tt>ԁ
+04;</tt>.
which I can then crudely translate back and forth between hex and decimal via one liners such as:
C:\> perl -e "printf q{%X}, 129489"
1F9D1
C:\> perl -e "printf q{%d}, 0x1F9D1"
129489
That was working fine
until the Discipulus posted an emoji to me in the Chatterbox the other day ...
and, oops, there was no xml link to click on! :)
| [reply] [d/l] [select] |
|
$ uparse 👔
============================================================
String: '👔'
============================================================
👔 U+1F454 NECKTIE
------------------------------------------------------------
The emoji for gellyfish didn't even render for me;
but I was still able to get information about it.
$ uparse 🪼
============================================================
String: '🪼'
============================================================
🪼 U+1FABC JELLYFISH
------------------------------------------------------------
There's also things like the emoji for GrandFather,
which I can only select as a single entity,
but would benefit from some analysis.
$ uparse 👨🦳👧👦
============================================================
String: '👨🦳👧👦'
============================================================
👨 U+1F468 MAN
U+200D ZERO WIDTH JOINER
🦳 U+1F9B3 EMOJI COMPONENT WHITE HAIR
U+200D ZERO WIDTH JOINER
👧 U+1F467 GIRL
U+200D ZERO WIDTH JOINER
👦 U+1F466 BOY
------------------------------------------------------------
Maybe at some future point we can add the white hair to this family setting:
$ uparse 👨👧👦
============================================================
String: '👨👧👦'
============================================================
👨 U+1F468 MAN
U+200D ZERO WIDTH JOINER
👧 U+1F467 GIRL
U+200D ZERO WIDTH JOINER
👦 U+1F466 BOY
------------------------------------------------------------
Although, maybe you can already do this with your Win11 Segoe UI Emoji font. Can you?
| [reply] [d/l] [select] |
|
Maybe at some future point we can add the white hair to this family setting ...
maybe you can already do this with your Win11 Segoe UI Emoji font. Can you?
You read me like a book, that's exactly what I was trying to do! :) ...
and was bitterly disappointed when it didn't work.
For completeness, I ran a simple standalone test using Windows 11 PowerShell.
PS C:\> $joiner = [char]::ConvertFromUtf32(0x200D)
PS C:\> $man = [char]::ConvertFromUtf32(0x1F468)
PS C:\> $girl = [char]::ConvertFromUtf32(0x1F467)
PS C:\> $boy = [char]::ConvertFromUtf32(0x1F466)
PS C:\> $whitehair = [char]::ConvertFromUtf32(0x1F9B3)
PS C:\> "$man$joiner$girl$joiner$boy"
👨👧👦
PS C:\> "$man$joiner$whitehair$joiner$girl$joiner$boy"
👨🦳👧👦
Running equivalent test on Ubuntu bash with echo -e produced the same depressing result.
It seems you can enjoy a family emoji with a default man, but not a man with white hair.
Maybe a Unicode emoji expert knows how to do it, but I don't.
| [reply] [d/l] [select] |
|
|
|
Re: uparse - Parse Unicode strings
by hippo (Archbishop) on Nov 18, 2023 at 10:19 UTC
|
BEGIN {
die "$0 requires Perl v5.7.3 or later.\n" if $] < 5.007003;
die "Usage: $0 string [string ...]\n" unless @ARGV;
}
Please enlighten me?
| [reply] [d/l] [select] |
|
$ uparse
/home/ken/local/bin/uparse requires Perl v5.7.3 or later.
$ uparse
Usage: /home/ken/local/bin/uparse string [string ...]
With your suggestion, the messages look like this:
$ uparse
/home/ken/local/bin/uparse requires Perl v5.7.3 or later.
BEGIN failed--compilation aborted at /home/ken/local/bin/uparse line 1
+5.
$ uparse
Usage: /home/ken/local/bin/uparse string [string ...]
BEGIN failed--compilation aborted at /home/ken/local/bin/uparse line 1
+5.
I didn't want the "BEGIN failed--compilation aborted at ..." lines.
| [reply] [d/l] [select] |
|
| [reply] |
Decoding @ARGV [Was: uparse - Parse Unicode strings]
by jo37 (Deacon) on Nov 22, 2023 at 20:38 UTC
|
Hi Ken!
Tried to find a general solution to the problem reported in Re: uparse - Parse Unicode strings.
Short explanation of the problem:
There are two basic ways to get correct UNICODE input from the elements in @ARGV:
- implicit decoding with a runtime option -CA or an environment setting PERL_UNICODE=A
- explicit decoding using Encode::decode
Either may be used, but not both.
A script that expects UNICODE data from @ARGV cannot easily detect if the implicit decoding is in effect,
especially because -CAL makes the behaviour locale-dependent.
The best solution I could find is to check if the data in question is already marked to be in UTF-8.
Encode::is_utf8 (or the equivalent utf8::is_utf8) may be used to check this flag, which results in
a small modification to your script:
diff --git a/uparse b/uparse
index f5edb92..b05e12a 100755
--- a/uparse
+++ b/uparse
@@ -23,11 +23,11 @@ use constant {
NO_PRINT => "\N{REPLACEMENT CHARACTER}",
};
-use Encode 'decode';
+use Encode qw(decode is_utf8);
use Unicode::UCD 'charinfo';
for my $raw_str (@ARGV) {
- my $str = decode('UTF-8', $raw_str);
+ my $str = is_utf8($raw_str) ? $raw_str : decode('UTF-8', $raw_str
+);
print "\n", SEP1;
print "String: '$str'\n";
print SEP1;
What do you think about this?
Greetings, -jo
$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
| [reply] [d/l] [select] |
|
++
Thanks for your analysis and patch.
I had planned, assuming I had sufficient time to spare, to have a further look at uparse this weekend
and get it to work on different platforms, and in various environments.
What you've provided is a good start and helps a lot.
| [reply] [d/l] |
|
[A follow-up to "Re: Decoding @ARGV [Was: uparse - Parse Unicode strings]".]
I'm not going to have sufficient spare time to do all that I wanted this weekend.
I have managed to incorporate your changes and do a couple of other minor things.
When prefixing the uparse command with PERL_UNICODE=A or PERL_UNICODE=SDAL,
I just get "Wide character at ..." and no other output.
I made these changes:
-
Added changes from your patch.
-
Changed "use open IO ..." to "use open OUT ...".
-
Modified the code layout (mostly to avoid wrapping in PM).
Here's the new code:
#!/usr/bin/env perl
BEGIN {
if ($] < 5.007003) {
warn "$0 requires Perl v5.7.3 or later.\n";
exit;
}
unless (@ARGV) {
warn "Usage: $0 string [string ...]\n";
exit;
}
}
use 5.007003;
use strict;
use warnings;
use open OUT => qw{:encoding(UTF-8) :std};
use constant {
SEP1 => '=' x 60 . "\n",
SEP2 => '-' x 60 . "\n",
FMT => "%s\tU+%-6X %s\n",
NO_PRINT => "\N{REPLACEMENT CHARACTER}",
};
use Encode qw{decode is_utf8};
use Unicode::UCD 'charinfo';
for my $raw_str (@ARGV) {
my $str = is_utf8($raw_str)
? $raw_str
: decode('UTF-8', $raw_str);
print "\n", SEP1;
print "String: '$str'\n";
print SEP1;
for my $char (split //, $str) {
my $code_point = ord $char;
my $char_info = charinfo($code_point);
if (! defined $char_info) {
$char_info->{name}
= "<unknown> Perl $^V supports Unicode "
. Unicode::UCD::UnicodeVersion();
}
printf FMT, ($char =~ /^\p{Print}$/ ? $char : NO_PRINT),
$code_point, $char_info->{name};
}
print SEP2;
}
Here's a test run with just uparse:
$ uparse 👮🏼 👮🏼♀️ 👮🏼♂️
============================================================
String: '👮🏼'
============================================================
👮 U+1F46E POLICE OFFICER
🏼 U+1F3FC EMOJI MODIFIER FITZPATRICK TYPE-3
------------------------------------------------------------
============================================================
String: '👮🏼♀️'
============================================================
👮 U+1F46E POLICE OFFICER
🏼 U+1F3FC EMOJI MODIFIER FITZPATRICK TYPE-3
U+200D ZERO WIDTH JOINER
♀ U+2640 FEMALE SIGN
U+FE0F VARIATION SELECTOR-16
------------------------------------------------------------
============================================================
String: '👮🏼♂️'
============================================================
👮 U+1F46E POLICE OFFICER
🏼 U+1F3FC EMOJI MODIFIER FITZPATRICK TYPE-3
U+200D ZERO WIDTH JOINER
♂ U+2642 MALE SIGN
U+FE0F VARIATION SELECTOR-16
------------------------------------------------------------
And again, this time with PERL_UNICODE=A:
$ PERL_UNICODE=A uparse 👮🏼 👮🏼♀️ 👮🏼♂️
============================================================
String: '👮🏼'
============================================================
👮 U+1F46E POLICE OFFICER
🏼 U+1F3FC EMOJI MODIFIER FITZPATRICK TYPE-3
------------------------------------------------------------
============================================================
String: '👮🏼♀️'
============================================================
👮 U+1F46E POLICE OFFICER
🏼 U+1F3FC EMOJI MODIFIER FITZPATRICK TYPE-3
U+200D ZERO WIDTH JOINER
♀ U+2640 FEMALE SIGN
U+FE0F VARIATION SELECTOR-16
------------------------------------------------------------
============================================================
String: '👮🏼♂️'
============================================================
👮 U+1F46E POLICE OFFICER
🏼 U+1F3FC EMOJI MODIFIER FITZPATRICK TYPE-3
U+200D ZERO WIDTH JOINER
♂ U+2642 MALE SIGN
U+FE0F VARIATION SELECTOR-16
------------------------------------------------------------
Using "PERL_UNICODE=SDAL" gives the same output as "PERL_UNICODE=A".
| [reply] [d/l] [select] |
Re: uparse - Parse Unicode strings
by jo37 (Deacon) on Nov 19, 2023 at 21:52 UTC
|
I don't know what is wrong with my locale setup.
Neither uparse nor uchar work on my old perl 5.032001 on Debian 11.
$ ./uparse.pl äöü
============================================================
String: '���'
============================================================
� U+FFFD REPLACEMENT CHARACTER
� U+FFFD REPLACEMENT CHARACTER
� U+FFFD REPLACEMENT CHARACTER
------------------------------------------------------------
$ ./uchar.pl -v äöü
� U0fffd \N{REPLACEMENT CHARACTER}
� U0fffd \N{REPLACEMENT CHARACTER}
� U0fffd \N{REPLACEMENT CHARACTER}
Removing decode from uparse.pl resolves the problem:
$ ./uparse.pl äöü
============================================================
String: 'äöü'
============================================================
ä U+E4 LATIN SMALL LETTER A WITH DIAERESIS
ö U+F6 LATIN SMALL LETTER O WITH DIAERESIS
ü U+FC LATIN SMALL LETTER U WITH DIAERESIS
------------------------------------------------------------
Greetings, -jo
$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
| [reply] [d/l] [select] |
|
$ perl -v | head -2 | tail -1
This is perl 5, version 32, subversion 0 (v5.32.0) built for cygwin-th
+read-multi
I saw the three vowels (WITH DIAERESIS) on the web page.
They didn't change when I pasted them onto my command line;
nor in the uparse output.
However, when I pasted the results back here:
$ uparse äöü
============================================================
String: 'äöü'
============================================================
ä U+E4 LATIN SMALL LETTER A WITH DIAERESIS
ö U+F6 LATIN SMALL LETTER O WITH DIAERESIS
ü U+FC LATIN SMALL LETTER U WITH DIAERESIS
------------------------------------------------------------
And just so that you know what I'm seeing:
$ uparse äöü
============================================================
String: 'äöü'
============================================================
à U+C3 LATIN CAPITAL LETTER A WITH TILDE
¤ U+A4 CURRENCY SIGN
à U+C3 LATIN CAPITAL LETTER A WITH TILDE
¶ U+B6 PILCROW SIGN
à U+C3 LATIN CAPITAL LETTER A WITH TILDE
¼ U+BC VULGAR FRACTION ONE QUARTER
------------------------------------------------------------
There were no surprises with my other tests.
$ uparse ���
============================================================
String: '���'
============================================================
� U+FFFD REPLACEMENT CHARACTER
� U+FFFD REPLACEMENT CHARACTER
� U+FFFD REPLACEMENT CHARACTER
------------------------------------------------------------
$ uparse 👨🦳👧👦
============================================================
String: '👨🦳👧👦'
============================================================
👨 U+1F468 MAN
U+200D ZERO WIDTH JOINER
🦳 U+1F9B3 EMOJI COMPONENT WHITE HAIR
U+200D ZERO WIDTH JOINER
👧 U+1F467 GIRL
U+200D ZERO WIDTH JOINER
👦 U+1F466 BOY
------------------------------------------------------------
$ uparse 👨🏽✈️
============================================================
String: '👨🏽✈️'
============================================================
👨 U+1F468 MAN
🏽 U+1F3FD EMOJI MODIFIER FITZPATRICK TYPE-4
U+200D ZERO WIDTH JOINER
✈ U+2708 AIRPLANE
U+FE0F VARIATION SELECTOR-16
------------------------------------------------------------
$ uparse X🩼X
============================================================
String: 'X🩼X'
============================================================
X U+58 LATIN CAPITAL LETTER X
� U+1FA7C <unknown> Perl v5.32.0 supports Unicode 13.0.0
X U+58 LATIN CAPITAL LETTER X
------------------------------------------------------------
$ uparse `perl -C -e 'print "X\x{1fa7d}X"'`
============================================================
String: 'XX'
============================================================
X U+58 LATIN CAPITAL LETTER X
� U+1FA7D <unknown> Perl v5.32.0 supports Unicode 13.0.0
X U+58 LATIN CAPITAL LETTER X
------------------------------------------------------------
You mentioned "locale setup" but didn't say what you have. I have:
LANG=en_AU.UTF-8
LC_ALL=en_AU.UTF-8
LC_COLLATE=en_AU.UTF-8
LC_CTYPE=en_AU.UTF-8
LC_MESSAGES=en_AU.UTF-8
LC_MONETARY=en_AU.UTF-8
LC_NUMERIC=en_AU.UTF-8
LC_TIME=en_AU.UTF-8
That's the best I can do.
Perhaps someone with the same O/S and Perl version as you can shed more light on your problem.
| [reply] [d/l] [select] |
|
$ perlbrew switch perl-5.32.0
$ perl -v | head -2 | tail -1
This is perl 5, version 32, subversion 0 (v5.32.0) built for cygwin-th
+read-multi
Copy-pasted from the Unicode PDF code chart
"C1 Controls and Latin-1 Supplement (Range: 0080-00FF)":
$ uparse äöü
============================================================
String: 'äöü'
============================================================
ä U+E4 LATIN SMALL LETTER A WITH DIAERESIS
ö U+F6 LATIN SMALL LETTER O WITH DIAERESIS
ü U+FC LATIN SMALL LETTER U WITH DIAERESIS
------------------------------------------------------------
Generated directly from a perl command:
$ uparse `perl -C -e 'print "\x{e4}\x{f6}\x{fc}"'`
============================================================
String: 'äöü'
============================================================
ä U+E4 LATIN SMALL LETTER A WITH DIAERESIS
ö U+F6 LATIN SMALL LETTER O WITH DIAERESIS
ü U+FC LATIN SMALL LETTER U WITH DIAERESIS
------------------------------------------------------------
Generated separately then copy-pasted as an argument to uparse:
$ perl -C -e 'print "\N{LATIN SMALL LETTER A WITH DIAERESIS}\N{LATIN SMALL LETTER O WITH DIAERESIS}\N{LATIN SMALL LETTER U WITH DIAERESIS}"'
äöü
$ uparse äöü
============================================================
String: 'äöü'
============================================================
ä U+E4 LATIN SMALL LETTER A WITH DIAERESIS
ö U+F6 LATIN SMALL LETTER O WITH DIAERESIS
ü U+FC LATIN SMALL LETTER U WITH DIAERESIS
------------------------------------------------------------
| [reply] [d/l] [select] |
|
Hi Ken!
I found the reason for the strange behaviour:
I didn't even remember, but I have PERL_UNICODE=SDAL set. Without this variable the script works correctly.
More specifically, it's the "A" in it.
From perlrun:
A 32 the @ARGV elements are expected to be strings encoded
in UTF-8
Thank you very much for your investigations!
Greetings, -jo
$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
| [reply] [d/l] |
|
The script assumes your terminal uses UTF-8. However, you are not using a UTF-8 locale. You should look into switching to a UTF-8 locale.
I didn't notice there were other comments already.
| [reply] |