Thanks for the feedback. I don't have a Debian available; I'm running Cygwin with Perlbrew and was able to wind back to v5.32.0 (the closest I have to your v5.32.1).
Under that version I have Unicode::UCD 0.75 and Encode 3.06 — what do you have?
Here's a few tests.
$ perl -v | head -2 | tail -1
This is perl 5, version 32, subversion 0 (v5.32.0) built for cygwin-th
+read-multi
I saw the three vowels (WITH DIAERESIS) on the web page.
They didn't change when I pasted them onto my command line;
nor in the uparse output.
However, when I pasted the results back here:
$ uparse äöü
============================================================
String: 'äöü'
============================================================
ä U+E4 LATIN SMALL LETTER A WITH DIAERESIS
ö U+F6 LATIN SMALL LETTER O WITH DIAERESIS
ü U+FC LATIN SMALL LETTER U WITH DIAERESIS
------------------------------------------------------------
And just so that you know what I'm seeing:
$ uparse äöü
============================================================
String: 'äöü'
============================================================
à U+C3 LATIN CAPITAL LETTER A WITH TILDE
¤ U+A4 CURRENCY SIGN
à U+C3 LATIN CAPITAL LETTER A WITH TILDE
¶ U+B6 PILCROW SIGN
à U+C3 LATIN CAPITAL LETTER A WITH TILDE
¼ U+BC VULGAR FRACTION ONE QUARTER
------------------------------------------------------------
There were no surprises with my other tests.
$ uparse ���
============================================================
String: '���'
============================================================
� U+FFFD REPLACEMENT CHARACTER
� U+FFFD REPLACEMENT CHARACTER
� U+FFFD REPLACEMENT CHARACTER
------------------------------------------------------------
$ uparse 👨🦳👧👦
============================================================
String: '👨🦳👧👦'
============================================================
👨 U+1F468 MAN
U+200D ZERO WIDTH JOINER
🦳 U+1F9B3 EMOJI COMPONENT WHITE HAIR
U+200D ZERO WIDTH JOINER
👧 U+1F467 GIRL
U+200D ZERO WIDTH JOINER
👦 U+1F466 BOY
------------------------------------------------------------
$ uparse 👨🏽✈️
============================================================
String: '👨🏽✈️'
============================================================
👨 U+1F468 MAN
🏽 U+1F3FD EMOJI MODIFIER FITZPATRICK TYPE-4
U+200D ZERO WIDTH JOINER
✈ U+2708 AIRPLANE
U+FE0F VARIATION SELECTOR-16
------------------------------------------------------------
$ uparse X🩼X
============================================================
String: 'X🩼X'
============================================================
X U+58 LATIN CAPITAL LETTER X
� U+1FA7C <unknown> Perl v5.32.0 supports Unicode 13.0.0
X U+58 LATIN CAPITAL LETTER X
------------------------------------------------------------
$ uparse `perl -C -e 'print "X\x{1fa7d}X"'`
============================================================
String: 'XX'
============================================================
X U+58 LATIN CAPITAL LETTER X
� U+1FA7D <unknown> Perl v5.32.0 supports Unicode 13.0.0
X U+58 LATIN CAPITAL LETTER X
------------------------------------------------------------
You mentioned "locale setup" but didn't say what you have. I have:
LANG=en_AU.UTF-8
LC_ALL=en_AU.UTF-8
LC_COLLATE=en_AU.UTF-8
LC_CTYPE=en_AU.UTF-8
LC_MESSAGES=en_AU.UTF-8
LC_MONETARY=en_AU.UTF-8
LC_NUMERIC=en_AU.UTF-8
LC_TIME=en_AU.UTF-8
That's the best I can do.
Perhaps someone with the same O/S and Perl version as you can shed more light on your problem.
|