Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

The problem is the definition of character in the context of Unicode text.

No, I fully agree with you with the definition of character in the context of Unicode text.

At issue is that reverse cannot recognise the presence of Unicode text. How do you think reverse can tell the difference between chr(113).chr(101).chr(769) and "qe\N{COMBINING ACUTE ACCENT}"?

It can either always treat the string as Unicode text, or never. Currently, it never does. To change that is backwards incompatible, so you'd have to demonstrate a bug in order to change that behaviour.

use strict; use warnings; use charnames qw( :full ); sub current_reverse { return reverse(@_); } sub string_reverse { @_ = return reverse(@_) if wantarray; my @chars = join('', @_) =~ /./sg; return join '', @chars[ reverse 0..$#chars ]; } sub unicode_reverse { return reverse(@_) if wantarray; my @chars = join('', @_) =~ /\X/g; return join '', @chars[ reverse 0..$#chars ]; } printf("%-7s %-7s %-7s\n", "", "samples", "text"); printf("%-7s %-7s %-7s\n", "", "-------", "-------"); for (qw( current string unicode )) { my $reverser = do { no strict 'refs'; \&{ $_."_reverse" } }; my $water_samples = join '', map chr, 113, 101, 769; $water_samples = $reverser->($water_samples); my $last_sample = substr($water_samples, 0, 1); my $text = "Cafe\N{COMBINING ACUTE ACCENT}"; $text = $reverser->($text); my ($last_char) = $text =~ /^(\X)/; printf("%-7s %-7s %-7s\n", $_, ord($last_sample) == 769 ? 'ok' : 'not ok', $last_char eq "e\N{COMBINING ACUTE ACCENT}" ? 'ok' : 'not ok', ); }
samples text ------- ------- current ok not ok string ok not ok unicode not ok ok

Your whole argument for the presence of a bug is that reverse uses "character" could be confused with Unicode's definition of the word.

One or the other is wrong: the behavior of the reverse function or the reverse function's documentation.

Those are the only two options if and if reverse's documentation uses the same definition of "character" as the Unicode standard.

Update: Added code.


In reply to Re^9: Repurposing reverse by ikegami
in thread How to reverse a (Unicode) string by moritz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others lurking in the Monastery: (8)
    As of 2014-09-22 09:15 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (185 votes), past polls