Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction

parv has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction by Your Mother (Archbishop) on Oct 21, 2010 at 21:11 UTC
Check out the charinfo number stuff in Unicode::UCD. Untested but I suspect you'll find what you need in there.	[reply]
Re: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction by tchrist (Pilgrim) on Apr 10, 2011 at 02:52 UTC
I am only concerned right now about getting a clue about how to convert things like `"\x{be}"` into things like 0.75. You mean like this? use 5.14.0; use strict; use warnings; use charnames qw/:full/; use Unicode::UCD 0.32 qw/num/; my @cp = ( "bogus", "\N{DIGIT FOUR}\N{DIGIT TWO}", "\N{VULGAR FRACTION THREE QUARTERS}", "\N{VULGAR FRACTION TWO THIRDS}", "\N{VULGAR FRACTION ONE SEVENTH}", "\N{VULGAR FRACTION SEVEN EIGHTHS}", "\N{SUPERSCRIPT THREE}", "\N{SUBSCRIPT EIGHT}", "\N{FULLWIDTH DIGIT TWO}\N{FULLWIDTH DIGIT FIVE}", "\N{ROMAN NUMERAL EIGHT}", "\N{ROMAN NUMERAL ONE HUNDRED THOUSAND}", "\N{BENGALI DIGIT FOUR}\N{BENGALI DIGIT SEVEN}\N{BENGALI DIGIT FIVE}\N{BENGALI DIGIT SIX}", "\N{RUMI NUMBER SEVEN HUNDRED}", "\N{AEGEAN NUMBER NINETY THOUSAND}", "\N{ORIYA FRACTION THREE SIXTEENTHS}", "\N{TIBETAN DIGIT HALF ZERO}", "\N{TIBETAN DIGIT HALF ONE}", "\N{TIBETAN DIGIT HALF SEVEN}", "\N{BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR}", "\N{GREEK ACROPHONIC ATTIC FIFTY THOUSAND STATERS}", ); for my $cp (@cp) { printf "%s\t= %20s\tU+%vX\n", $cp, num($cp) // "NaN", $cp; } __END__ bogus = NaN U+62.6F.67.75.73 42 = 42 U+34.32 ž = 0.75 U+BE ⅔ = 0.666666666666667 U+2154 ⅐ = 0.142857142857143 U+2150 ⅞ = 0.875 U+215E ł = 3 U+B3 ₈ = 8 U+2088 25 = 25 U+FF12.FF15 Ⅷ = 8 U+2167 ↈ = 100000 U+2188 ৪৭৫৬ = 4756 U+9EA.9ED.9EB.9EC 𐹸 = 700 U+10E78 𐄳 = 90000 U+10133 ୷ = 0.1875 U+B77 ༳ = -0.5 U+F33 ༪ = 0.5 U+F2A ༰ = 6.5 U+F30 ৸ = 0.75 U+9F8 𐅖 = 50000 U+10156	[reply] [d/l]
Re^2: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction by educated_foo (Vicar) on Apr 10, 2011 at 04:59 UTC
ↈ = 100000 U+2188 ৪৭৫৬ = 4756 U+9EA.9ED.9EB.9EC 𐹸 = 700 U+10E78 This is why Unicode is so great: "four hollow boxes" apparently means 4756, and "one hollow box" means 700, except when it means 1e5. BTW, this is in the latest version of Safari, which seems to make a real effort to do Unicode. It's hard to implement a standard that tries to do text, pictographs, and a bit of typesetting.	[reply]
Re^3: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction by Your Mother (Archbishop) on Jul 13, 2011 at 22:45 UTC
I discovered Symbola through tchrist here and it allows pretty much everything to render.	[reply]
Re^3: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction by ikegami (Patriarch) on Jul 13, 2011 at 21:27 UTC
You see boxes (or worse, the wrong glyph) if you don't have appropriate fonts for a charset, whether that charset is Unicode or not. Your problem has nothing to do with Unicode.	[reply]
Re^2: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction by parv (Parson) on Jul 13, 2011 at 21:12 UTC
Thanks, tchrist, for pointing out &Unicode::UCD::num. (In retrospect I would have mentioned in OP that I was concerned only about the vulgar-ity of Unicode.)	[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks