Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^3: Counting text with ligatures

by haukex (Archbishop)
on Sep 13, 2017 at 14:02 UTC ( [id://1199310]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Counting text with ligatures
in thread Counting text with ligatures

My initial understanding of the OP's question was that it has to do with Unicode being able to represent the same user-visible character in multiple different ways, like with combining characters. That is, the two strings "\N{LATIN SMALL LETTER E WITH ACUTE}" and "e\N{COMBINING ACUTE ACCENT}" report different lengths (1 resp. 2), even though on the screen they both look like "é" (one "grapheme"), and so users would expect a "length" of each string to be reported as 1. I may have misunderstood the OP's question though - if you have the strings "ffi" vs. "ffi", and you want to know if they have the same length and/or are equal, then perhaps what the OP is looking for is Unicode equivalence (normalization).

use Unicode::Normalize; use Data::Dump; dd NFD("\N{LATIN SMALL LETTER E WITH ACUTE}"), NFD("e\N{COMBINING ACUTE ACCENT}"); dd NFC("\N{LATIN SMALL LETTER E WITH ACUTE}"), NFC("e\N{COMBINING ACUTE ACCENT}"); dd NFKD("\N{LATIN SMALL LIGATURE FFI}"); __END__ ("e\x{301}", "e\x{301}") ("\xE9", "\xE9") "ffi"

Updated example code to include the "é" examples.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1199310]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-25 07:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found