Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^5: How to reverse a (Unicode) string

by Jim (Curate)
on Jan 31, 2011 at 09:28 UTC ( #885224=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Repurposing reverse
in thread How to reverse a (Unicode) string

You're just blowing the same old tired, incomprehensible smoke here as in so many other discussions on PerlMonks about Unicode. Until you can make a compelling case for why this Perl code…

use utf8; binmode STDOUT, ':encoding(UTF-8)'; print reverse "Réaliste";

…should produce different output than this Perl code…

use utf8; binmode STDOUT, ':encoding(UTF-8)'; print join '', reverse "Réaliste" =~ m/\X/g;

…you're just arguing for the sake of argument about esoteric matters that aren't relevant at all to the topic at hand.


Comment on Re^5: How to reverse a (Unicode) string
Select or Download Code
Re^6: Repurposing reverse
by ikegami (Pope) on Jan 31, 2011 at 15:31 UTC

    in so many other discussions on PerlMonks about Unicode.

    The discussion isn't about Unicode, it's about reverse. reverse doesn't perform a Unicode operation.

    Until you can make a compelling case

    So it doesn't break this code. Not everyone limits themselves to using strings as you do. I'm sorry that I'm failing to make you see that, but I'm at wits end.

      So we're in agreement, the documentation for reverse needs to be updated to clarify what it does, right?
      perlunicode
      And finally, scalar reverse() reverses by character rather than by byte.
      perldoc -f reverse
      In scalar context, concatenates the elements of LIST and returns a string value with all characters in the opposite order.

        Second one first: I've already asked Jim for what youhe thinks would be a clearer name for an element of a string element. But,

        The first one is a counter argument for what I said. With that, one could declare reverse to be buggy because it should be reversing text, and make it so it reverses sequences of graphemes, which is a much more common use case.

        Update: Didn't realize I wasn't replying to Jim. Fixed.

        The problem isn't one of characters versus bytes. The problem is the definition of character in the context of Unicode text. The scalar reverse function and other built-in string functions operate on Unicode text using a naïve and inadequate definition of character. Pointing this out and offering a workaround is the raison d'être of moritz's 2008 tutorial.

        The issue of what reverse does when fed, say, the bytes of a JPEG image are utterly irrelevant to this discussion, which is about Unicode text. I don't understand ikegami's insistentence on trying to fold into this discussion unrelated contexts. Your reply dramatizes how ikegami's contrarian non sequitur needlessly confused the simple and self-evident conclusion I made in my post.

        Here's what I wrote:

        The documentation of Perl's reverse function states: "In scalar context, [the reverse function] ... returns a string value with all characters in the opposite order." But it doesn't, at least not for a sufficiently modern, multilingual, Unicode-conformant definition of "character." It reverses Unicode code points, not characters in the usual, well-understood sense of the word.
        One or the other is wrong: the behavior of the reverse function or the reverse function's documentation.
        If I understand the design principles of Perl correctly, the reverse function should properly reverse extended grapheme clusters when the thing being reversed is Unicode text (and Perl understands it is Unicode text), and it should reverse bytes otherwise.
      The discussion isn't about Unicode…

      Yes it is about Unicode. "Unicode" is in the subject. Please go start your own discussion of water level measurements somewhere else.

      reverse doesn't perform a Unicode operation.

      Yes it does. Every time it reverses a Unicode string, it performs a Unicode operation.

      What kind of operation is this Perl code performing?

      use utf8; $utf8_text =~ s/\p{General_Category=Currency_Symbol}+/€/g;

      Not a Unicode operation?

        Yes it is about Unicode. "Unicode" is in the subject.

        Sorry for not setting the subject correctly. Fixed.

        Every time it reverses a Unicode string, it performs a Unicode operation.

        We've both said that it doesn't. At issue is whether it should. I would like it to, but I'm not sure it's possible.

        What kind of operation is this Perl code performing?

        A substitution based on a Unicode property match.

        Not a Unicode operation?

        I agree that it is. substr, length and reverse are not, though.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://885224]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2014-09-23 10:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (216 votes), past polls