Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction

by parv (Parson)
on Oct 21, 2010 at 20:13 UTC ( [id://866633]=perlquestion: print w/replies, xml ) Need Help??

parv has asked for the wisdom of the Perl Monks concerning the following question:

Is there a Perl equivalent of Python unicodedata module? I am only concerned right now about getting a clue about how to convert things like "\x{be}" into things like 0.75. I am trying to avoid the start by creating a hash (which could be the fastest way to the goal).

All I could find were references to various Perl unicode related manual pages or use of (un)pack & Encode. Perhaps I missed something there?.

  • Comment on Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction
  • Select or Download Code

Replies are listed 'Best First'.
Re: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction
by Your Mother (Archbishop) on Oct 21, 2010 at 21:11 UTC

    Check out the charinfo number stuff in Unicode::UCD. Untested but I suspect you'll find what you need in there.

Re: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction
by tchrist (Pilgrim) on Apr 10, 2011 at 02:52 UTC
    I am only concerned right now about getting a clue about how to convert things like "\x{be}" into things like 0.75.

    You mean like this?

    
    use 5.14.0;
    use strict;
    use warnings;
    use charnames qw/:full/;
    use Unicode::UCD 0.32 qw/num/;
    
    my @cp = (
        "bogus",
        "\N{DIGIT FOUR}\N{DIGIT TWO}",
        "\N{VULGAR FRACTION THREE QUARTERS}",
        "\N{VULGAR FRACTION TWO THIRDS}",
        "\N{VULGAR FRACTION ONE SEVENTH}",
        "\N{VULGAR FRACTION SEVEN EIGHTHS}",
        "\N{SUPERSCRIPT THREE}",
        "\N{SUBSCRIPT EIGHT}",
        "\N{FULLWIDTH DIGIT TWO}\N{FULLWIDTH DIGIT FIVE}",
        "\N{ROMAN NUMERAL EIGHT}",
        "\N{ROMAN NUMERAL ONE HUNDRED THOUSAND}",
        "\N{BENGALI DIGIT FOUR}\N{BENGALI DIGIT SEVEN}\N{BENGALI DIGIT FIVE}\N{BENGALI DIGIT SIX}",
        "\N{RUMI NUMBER SEVEN HUNDRED}",
        "\N{AEGEAN NUMBER NINETY THOUSAND}",
        "\N{ORIYA FRACTION THREE SIXTEENTHS}",
        "\N{TIBETAN DIGIT HALF ZERO}",
        "\N{TIBETAN DIGIT HALF ONE}",
        "\N{TIBETAN DIGIT HALF SEVEN}",
        "\N{BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR}",
        "\N{GREEK ACROPHONIC ATTIC FIFTY THOUSAND STATERS}",
    );
    
    for my $cp (@cp) {
        printf "%s\t= %20s\tU+%vX\n", $cp, num($cp) // "NaN", $cp;
    }
    __END__
    bogus   =                  NaN  U+62.6F.67.75.73
    42      =                   42  U+34.32
    ¾       =                 0.75  U+BE
    ⅔       =    0.666666666666667  U+2154
    ⅐       =    0.142857142857143  U+2150
    ⅞       =                0.875  U+215E
    ³       =                    3  U+B3
    ₈       =                    8  U+2088
    25    =                   25  U+FF12.FF15
    Ⅷ       =                    8  U+2167
    ↈ       =               100000  U+2188
    ৪৭৫৬    =                 4756  U+9EA.9ED.9EB.9EC
    𐹸       =                  700  U+10E78
    𐄳       =                90000  U+10133
    ୷       =               0.1875  U+B77
    ༳       =                 -0.5  U+F33
    ༪       =                  0.5  U+F2A
    ༰       =                  6.5  U+F30
    ৸       =                 0.75  U+9F8
    𐅖       =                50000  U+10156
    
      ↈ = 100000 U+2188
      ৪৭৫৬ = 4756 U+9EA.9ED.9EB.9EC
      𐹸 = 700 U+10E78
      This is why Unicode is so great: "four hollow boxes" apparently means 4756, and "one hollow box" means 700, except when it means 1e5. BTW, this is in the latest version of Safari, which seems to make a real effort to do Unicode. It's hard to implement a standard that tries to do text, pictographs, and a bit of typesetting.

        I discovered Symbola through tchrist here and it allows pretty much everything to render.

        You see boxes (or worse, the wrong glyph) if you don't have appropriate fonts for a charset, whether that charset is Unicode or not. Your problem has nothing to do with Unicode.

      Thanks, tchrist, for pointing out &Unicode::UCD::num.

      (In retrospect I would have mentioned in OP that I was concerned only about the vulgar-ity of Unicode.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://866633]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-04-25 14:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found