comment on

My initial understanding of the OP's question was that it has to do with Unicode being able to represent the same user-visible character in multiple different ways, like with combining characters. That is, the two strings "\N{LATIN SMALL LETTER E WITH ACUTE}" and "e\N{COMBINING ACUTE ACCENT}" report different lengths (1 resp. 2), even though on the screen they both look like "é" (one "grapheme"), and so users would expect a "length" of each string to be reported as 1. I may have misunderstood the OP's question though - if you have the strings "ffi" vs. "ﬃ", and you want to know if they have the same length and/or are equal, then perhaps what the OP is looking for is Unicode equivalence (normalization).

use Unicode::Normalize;
use Data::Dump;
dd NFD("\N{LATIN SMALL LETTER E WITH ACUTE}"),
   NFD("e\N{COMBINING ACUTE ACCENT}");
dd NFC("\N{LATIN SMALL LETTER E WITH ACUTE}"),
   NFC("e\N{COMBINING ACUTE ACCENT}");
dd NFKD("\N{LATIN SMALL LIGATURE FFI}");
__END__
("e\x{301}", "e\x{301}")
("\xE9", "\xE9")
"ffi"
[download]

Updated example code to include the "é" examples.

In reply to Re^3: Counting text with ligatures by haukex
in thread Counting text with ligatures by albert

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


The stupid question is the question not asked
	PerlMonks