in reply to getting Unicode character names from string
Sometimes a very ungeneral example makes a more immediately understandable demo:
#!perl use v5.14; use strict; use warnings; use charnames qw( :full ); use Unicode::Normalize qw( NFKD ); binmode STDOUT, ':encoding(UTF-8)'; my $greek_small_letter_alpha_with_oxia = NFKD("\N{GREEK SMALL LETTER ALPHA WITH OXIA}"); my $greek_small_letter_alpha_with_varia = NFKD("\N{GREEK SMALL LETTER ALPHA WITH VARIA}"); my $greek_small_letter_alpha_without_oxia = $greek_small_letter_alpha_with_oxia; my $greek_small_letter_alpha_without_varia = $greek_small_letter_alpha_with_varia; $greek_small_letter_alpha_without_oxia =~ s/\p{Nonspacing_Mark}//g; $greek_small_letter_alpha_without_varia =~ s/\p{Nonspacing_Mark}//g; my $greek_small_letter_alpha_without_oxia_code_point = sprintf 'U+%04x', ord $greek_small_letter_alpha_without_oxia; my $greek_small_letter_alpha_without_varia_code_point = sprintf 'U+%04x', ord $greek_small_letter_alpha_without_varia; my $output = <<END; \$greek_small_letter_alpha_with_oxia = $greek_small_letter_alpha_with_oxia \$greek_small_letter_alpha_with_varia = $greek_small_letter_alpha_with_varia \$greek_small_letter_alpha_without_oxia = $greek_small_letter_alpha_without_oxia \$greek_small_letter_alpha_without_varia = $greek_small_letter_alpha_without_varia \$greek_small_letter_alpha_without_oxia_code_point = $greek_small_letter_alpha_without_oxia_code_point \$greek_small_letter_alpha_without_varia_code_point = $greek_small_letter_alpha_without_varia_code_point END $output =~ s/(?<==)\n(?= )//g; print $output; exit 0;
This script prints…
$greek_small_letter_alpha_with_oxia = ά $greek_small_letter_alpha_with_varia = ὰ $greek_small_letter_alpha_without_oxia = α $greek_small_letter_alpha_without_varia = α $greek_small_letter_alpha_without_oxia_code_point = U+03b1 $greek_small_letter_alpha_without_varia_code_point = U+03b1
The pattern here is to normalize the graphemes to Unicode NFKD and then strip them of all non-spacing characters. (But see http://stackoverflow.com/questions/5697171/regex-what-is-incombiningdiacriticalmarks for tchrist's much more detailed information about this pattern.)
|
---|
In Section
Seekers of Perl Wisdom