Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction

by parv (Priest)
on Oct 21, 2010 at 20:13 UTC ( #866633=perlquestion: print w/ replies, xml ) Need Help??
parv has asked for the wisdom of the Perl Monks concerning the following question:

Is there a Perl equivalent of Python unicodedata module? I am only concerned right now about getting a clue about how to convert things like "\x{be}" into things like 0.75. I am trying to avoid the start by creating a hash (which could be the fastest way to the goal).

All I could find were references to various Perl unicode related manual pages or use of (un)pack & Encode. Perhaps I missed something there?.

Comment on Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction
Select or Download Code
Re: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction
by Your Mother (Chancellor) on Oct 21, 2010 at 21:11 UTC

    Check out the charinfo number stuff in Unicode::UCD. Untested but I suspect you'll find what you need in there.

Re: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction
by tchrist (Pilgrim) on Apr 10, 2011 at 02:52 UTC
    I am only concerned right now about getting a clue about how to convert things like "\x{be}" into things like 0.75.

    You mean like this?

    
    use 5.14.0;
    use strict;
    use warnings;
    use charnames qw/:full/;
    use Unicode::UCD 0.32 qw/num/;
    
    my @cp = (
        "bogus",
        "\N{DIGIT FOUR}\N{DIGIT TWO}",
        "\N{VULGAR FRACTION THREE QUARTERS}",
        "\N{VULGAR FRACTION TWO THIRDS}",
        "\N{VULGAR FRACTION ONE SEVENTH}",
        "\N{VULGAR FRACTION SEVEN EIGHTHS}",
        "\N{SUPERSCRIPT THREE}",
        "\N{SUBSCRIPT EIGHT}",
        "\N{FULLWIDTH DIGIT TWO}\N{FULLWIDTH DIGIT FIVE}",
        "\N{ROMAN NUMERAL EIGHT}",
        "\N{ROMAN NUMERAL ONE HUNDRED THOUSAND}",
        "\N{BENGALI DIGIT FOUR}\N{BENGALI DIGIT SEVEN}\N{BENGALI DIGIT FIVE}\N{BENGALI DIGIT SIX}",
        "\N{RUMI NUMBER SEVEN HUNDRED}",
        "\N{AEGEAN NUMBER NINETY THOUSAND}",
        "\N{ORIYA FRACTION THREE SIXTEENTHS}",
        "\N{TIBETAN DIGIT HALF ZERO}",
        "\N{TIBETAN DIGIT HALF ONE}",
        "\N{TIBETAN DIGIT HALF SEVEN}",
        "\N{BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR}",
        "\N{GREEK ACROPHONIC ATTIC FIFTY THOUSAND STATERS}",
    );
    
    for my $cp (@cp) {
        printf "%s\t= %20s\tU+%vX\n", $cp, num($cp) // "NaN", $cp;
    }
    __END__
    bogus   =                  NaN  U+62.6F.67.75.73
    42      =                   42  U+34.32
           =                 0.75  U+BE
    ⅔       =    0.666666666666667  U+2154
    ⅐       =    0.142857142857143  U+2150
    ⅞       =                0.875  U+215E
           =                    3  U+B3
    ₈       =                    8  U+2088
    25    =                   25  U+FF12.FF15
    Ⅷ       =                    8  U+2167
    ↈ       =               100000  U+2188
    ৪৭৫৬    =                 4756  U+9EA.9ED.9EB.9EC
    𐹸       =                  700  U+10E78
    𐄳       =                90000  U+10133
    ୷       =               0.1875  U+B77
    ༳       =                 -0.5  U+F33
    ༪       =                  0.5  U+F2A
    ༰       =                  6.5  U+F30
    ৸       =                 0.75  U+9F8
    𐅖       =                50000  U+10156
    
      ↈ = 100000 U+2188
      ৪৭৫৬ = 4756 U+9EA.9ED.9EB.9EC
      𐹸 = 700 U+10E78
      This is why Unicode is so great: "four hollow boxes" apparently means 4756, and "one hollow box" means 700, except when it means 1e5. BTW, this is in the latest version of Safari, which seems to make a real effort to do Unicode. It's hard to implement a standard that tries to do text, pictographs, and a bit of typesetting.
        You see boxes (or worse, the wrong glyph) if you don't have appropriate fonts for a charset, whether that charset is Unicode or not. Your problem has nothing to do with Unicode.

        I discovered Symbola through tchrist here and it allows pretty much everything to render.

      Thanks, tchrist, for pointing out &Unicode::UCD::num.

      (In retrospect I would have mentioned in OP that I was concerned only about the vulgar-ity of Unicode.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://866633]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2015-07-07 06:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (87 votes), past polls