Check out the charinfo number stuff in Unicode::UCD. Untested but I suspect you'll find what you need in there.
| [reply] |
I am only concerned right now about getting a clue about how to convert things like "\x{be}" into things like 0.75.
You mean like this?
use 5.14.0;
use strict;
use warnings;
use charnames qw/:full/;
use Unicode::UCD 0.32 qw/num/;
my @cp = (
"bogus",
"\N{DIGIT FOUR}\N{DIGIT TWO}",
"\N{VULGAR FRACTION THREE QUARTERS}",
"\N{VULGAR FRACTION TWO THIRDS}",
"\N{VULGAR FRACTION ONE SEVENTH}",
"\N{VULGAR FRACTION SEVEN EIGHTHS}",
"\N{SUPERSCRIPT THREE}",
"\N{SUBSCRIPT EIGHT}",
"\N{FULLWIDTH DIGIT TWO}\N{FULLWIDTH DIGIT FIVE}",
"\N{ROMAN NUMERAL EIGHT}",
"\N{ROMAN NUMERAL ONE HUNDRED THOUSAND}",
"\N{BENGALI DIGIT FOUR}\N{BENGALI DIGIT SEVEN}\N{BENGALI DIGIT FIVE}\N{BENGALI DIGIT SIX}",
"\N{RUMI NUMBER SEVEN HUNDRED}",
"\N{AEGEAN NUMBER NINETY THOUSAND}",
"\N{ORIYA FRACTION THREE SIXTEENTHS}",
"\N{TIBETAN DIGIT HALF ZERO}",
"\N{TIBETAN DIGIT HALF ONE}",
"\N{TIBETAN DIGIT HALF SEVEN}",
"\N{BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR}",
"\N{GREEK ACROPHONIC ATTIC FIFTY THOUSAND STATERS}",
);
for my $cp (@cp) {
printf "%s\t= %20s\tU+%vX\n", $cp, num($cp) // "NaN", $cp;
}
__END__
bogus = NaN U+62.6F.67.75.73
42 = 42 U+34.32
¾ = 0.75 U+BE
⅔ = 0.666666666666667 U+2154
⅐ = 0.142857142857143 U+2150
⅞ = 0.875 U+215E
³ = 3 U+B3
₈ = 8 U+2088
25 = 25 U+FF12.FF15
Ⅷ = 8 U+2167
ↈ = 100000 U+2188
৪৭৫৬ = 4756 U+9EA.9ED.9EB.9EC
𐹸 = 700 U+10E78
𐄳 = 90000 U+10133
୷ = 0.1875 U+B77
༳ = -0.5 U+F33
༪ = 0.5 U+F2A
༰ = 6.5 U+F30
৸ = 0.75 U+9F8
𐅖 = 50000 U+10156
| [reply] [d/l] |
ↈ = 100000 U+2188
৪৭৫৬ = 4756 U+9EA.9ED.9EB.9EC
𐹸 = 700 U+10E78
This is why Unicode is so great: "four hollow boxes" apparently means 4756, and "one hollow box" means 700, except when it means 1e5. BTW, this is in the latest version of Safari, which seems to make a real effort to do Unicode. It's hard to implement a standard that tries to do text, pictographs, and a bit of typesetting.
| [reply] |
| [reply] |
You see boxes (or worse, the wrong glyph) if you don't have appropriate fonts for a charset, whether that charset is Unicode or not. Your problem has nothing to do with Unicode.
| [reply] |
Thanks, tchrist, for pointing out
&Unicode::UCD::num.
(In retrospect I would have mentioned in OP that I was concerned only about the vulgar-ity of Unicode.)
| [reply] |