in reply to Re^4: Unicode vulgar fraction composition
in thread Unicode vulgar fraction composition

Absolutely true if (as you wrote) a U+002F SOLIDUS appears between the 3 and the 8. This is why I've been limiting my scope to the case where a U+2044 FRACTION SLASH appears between them, i.e., the specific sequence that NFKC or NFKD decomposes a Unicode vulgar fraction into.

Replies are listed 'Best First'.
Re^6: Unicode vulgar fraction composition
by ikegami (Pope) on Oct 06, 2020 at 20:17 UTC

    My mistake.

    Then yeah, one could possibly argue that this should be have been a standard decomposition rather than a compatibility decomposition. But they'd be wrong.

    A program is free to switch between the NFC and the NFD of a string at any time. As such, they should be visually and semantically indistinguishable. In other words, the two forms are simply two different ways of encoding graphemes internally.

    Grapheme. (1) A minimally distinctive unit of writing in the context of a particular writing system. For example, b and d are distinct graphemes in English writing systems because there exist distinct words like big and dig. Conversely, a lowercase italiform letter a and a lowercase Roman letter a are not distinct graphemes because no word is distinguished on the basis of these two different forms. (2) What a user thinks of as a character.

    "7/8" isn't a grapheme[1], much less the same one as "⅞". As such, the two strings could have different appearances or meanings, and it's easy to come up with an example where someone might intentionally use "7/8" over "⅞". Imagine a document containing "... between 7/8 and 15/16 of the ...". The author might purposefully not use "⅞" for stylistic consistency. It would not be proper for a program to automatically convert "7/8" to "⅞" wherever it occurs.

    The short version is that noone can guess what transformations you want to perform, so it's up to you to determine the rules you want to follow, which is to say write a program that does what you want. Do you want to change "7/8" into "⅞" unconditionally? conditionally? What about LATIN CAPITAL LETTER A WITH RING ABOVE (). Is there a time it should become ANGSTROM SIGN (Å)? etc These are decisions for you to take.


    1. Note I used a normal slash instead of a FRACTION SLASH throughout this post to avoid confusion because my browser rendered fractions with a FRACTION SLASH much like "⅞", and yours might to. But it is under no obligation to do so, and other renders won't do this.

      one could possibly argue that this should be have been a standard decomposition rather than a compatibility decomposition.
      I might agree with someone making this argument, but it is not the argument I'm making or have made in this thread.
      "7/8" isn't a grapheme, much less the same one as "⅞".
      (where the "/" above is shorthand for U+2044 FRACTION SLASH)

      My understanding of the intent of the Unicode FRACTION SLASH character is that it's intended to draw exactly such an equivalency. Your web browser agrees with this interpretation, rendering the sequences identically, as your footnote explains. Any use of a slash-like character to separate 7 and 8 with other meanings (like as part of a date) would have to use a slash other than U+2044. What exactly is the purpose of having a slash specifically earmarked for fractions if not to indicate that the string containing it does in fact represent a fraction?

      Nonetheless, as you note, Unicode does make clear that this decomposition is a compatibility one rather than a standard one, so the Perl library correctly reflects this.

        My understanding of the intent of the Unicode FRACTION SLASH character is that it's intended to draw exactly such an equivalency

        It could very well be intended that agents may draw them as a fraction when possible, but it's not always possible (e.g. pic), and that's what matters.