Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: getting Unicode character names from string

by Jim (Curate)
on Oct 11, 2012 at 01:30 UTC ( #998348=note: print w/replies, xml ) Need Help??


in reply to getting Unicode character names from string

Sometimes a very ungeneral example makes a more immediately understandable demo:

#!perl use v5.14; use strict; use warnings; use charnames qw( :full ); use Unicode::Normalize qw( NFKD ); binmode STDOUT, ':encoding(UTF-8)'; my $greek_small_letter_alpha_with_oxia = NFKD("\N{GREEK SMALL LETTER ALPHA WITH OXIA}"); my $greek_small_letter_alpha_with_varia = NFKD("\N{GREEK SMALL LETTER ALPHA WITH VARIA}"); my $greek_small_letter_alpha_without_oxia = $greek_small_letter_alpha_with_oxia; my $greek_small_letter_alpha_without_varia = $greek_small_letter_alpha_with_varia; $greek_small_letter_alpha_without_oxia =~ s/\p{Nonspacing_Mark}//g; $greek_small_letter_alpha_without_varia =~ s/\p{Nonspacing_Mark}//g; my $greek_small_letter_alpha_without_oxia_code_point = sprintf 'U+%04x', ord $greek_small_letter_alpha_without_oxia; my $greek_small_letter_alpha_without_varia_code_point = sprintf 'U+%04x', ord $greek_small_letter_alpha_without_varia; my $output = <<END; \$greek_small_letter_alpha_with_oxia = $greek_small_letter_alpha_with_oxia \$greek_small_letter_alpha_with_varia = $greek_small_letter_alpha_with_varia \$greek_small_letter_alpha_without_oxia = $greek_small_letter_alpha_without_oxia \$greek_small_letter_alpha_without_varia = $greek_small_letter_alpha_without_varia \$greek_small_letter_alpha_without_oxia_code_point = $greek_small_letter_alpha_without_oxia_code_point \$greek_small_letter_alpha_without_varia_code_point = $greek_small_letter_alpha_without_varia_code_point END $output =~ s/(?<==)\n(?= )//g; print $output; exit 0;

This script prints…

$greek_small_letter_alpha_with_oxia                = ά
$greek_small_letter_alpha_with_varia               = ὰ
$greek_small_letter_alpha_without_oxia             = α
$greek_small_letter_alpha_without_varia            = α
$greek_small_letter_alpha_without_oxia_code_point  = U+03b1
$greek_small_letter_alpha_without_varia_code_point = U+03b1

The pattern here is to normalize the graphemes to Unicode NFKD and then strip them of all non-spacing characters. (But see http://stackoverflow.com/questions/5697171/regex-what-is-incombiningdiacriticalmarks for tchrist's much more detailed information about this pattern.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://998348]
help
Chatterbox?
[marioroy]: choroba++, Discipulus++. It depends on the type of module. Data-type "only" modules are likely multi-process safe, re: Hash::Ordered, Tie::IxHash.
[marioroy]: ... when shared via MCE::Share-> share(...)
[marioroy]: Net type modules are likely not multi-process safe unless stated in the documentation.
[marioroy]: The Prima author fixed his module to be both thread and multi-process safe. Thanks Dmitry.
[marioroy]: Of all the GUI-type modules, Prima was the worst regarding thread/multi- process safety. Now, it's the best for safety. ;-)
[marioroy]: Tk, Gtk2, Gtk3 requires extra care.

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2017-09-22 10:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    During the recent solar eclipse, I:









    Results (260 votes). Check out past polls.

    Notices?