Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re^2: Counting text with ligatures

by Corion (Pope)
on Sep 13, 2017 at 13:51 UTC ( #1199306=note: print w/replies, xml ) Need Help??

in reply to Re: Counting text with ligatures
in thread Counting text with ligatures

At least on Perl 5.14 and Perl 5.20, this doesn't work (and I don't understand why):

use strict; use charnames ":full"; my $string = "\N{LATIN SMALL LIGATURE FFI}"; print "length: ",length($string),"\n"; # wrong way my $len = () = $string=~/\X/g; print "len: $len\n"; my @graphs = split /\X\K(?=\X)/, $string; print "graphs: ", 0+@graphs, "\n"; __END__ length: 1 len: 1 graphs: 1

Is maybe our understanding of graphemes different from the separate letters of the ligatures?

Replies are listed 'Best First'.
Re^3: Counting text with ligatures
by haukex (Abbot) on Sep 13, 2017 at 14:02 UTC

    My initial understanding of the OP's question was that it has to do with Unicode being able to represent the same user-visible character in multiple different ways, like with combining characters. That is, the two strings "\N{LATIN SMALL LETTER E WITH ACUTE}" and "e\N{COMBINING ACUTE ACCENT}" report different lengths (1 resp. 2), even though on the screen they both look like "" (one "grapheme"), and so users would expect a "length" of each string to be reported as 1. I may have misunderstood the OP's question though - if you have the strings "ffi" vs. "ffi", and you want to know if they have the same length and/or are equal, then perhaps what the OP is looking for is Unicode equivalence (normalization).

    use Unicode::Normalize; use Data::Dump; dd NFD("\N{LATIN SMALL LETTER E WITH ACUTE}"), NFD("e\N{COMBINING ACUTE ACCENT}"); dd NFC("\N{LATIN SMALL LETTER E WITH ACUTE}"), NFC("e\N{COMBINING ACUTE ACCENT}"); dd NFKD("\N{LATIN SMALL LIGATURE FFI}"); __END__ ("e\x{301}", "e\x{301}") ("\xE9", "\xE9") "ffi"

    Updated example code to include the "" examples.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1199306]
and cookies bake in the oven...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2018-07-18 09:13 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (388 votes). Check out past polls.