Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: encoding question

by JavaFan (Canon)
on May 20, 2010 at 07:54 UTC ( #840866=note: print w/replies, xml ) Need Help??

in reply to encoding question

The substitution should happen regardless whether the string has the UTF-8 flag set or not.

However, what is important is how the "'A with circumflex" is encoded in the source code, and what Perl thinks the encoding is.

To avoid problems, I always try to not have any characters with code points over 127 in my source code, and specially avoid the code points 128-255. If I were to write the line, I would write it as:

my $str = "Th\x{92}t \x{92}pple";
that should work regardless whether perl thinks my source code is in UTF-8 format or not. (It may still get confused if it thinks my source code is written in EBCDIC, but that's no worry for me).

Replies are listed 'Best First'.
Re^2: encoding question
by Krambambuli (Curate) on May 20, 2010 at 08:56 UTC
    I don't really like that, sorry.

    If someone else will have to look on the code, (and just let's suppose you'll have more such characters following each other in the source,... ), 'reading' what's supposed to be there will be a chore.

    Instead, I'd emphasize in comments how the source code should be viewed _and_ what exactly should be seen.

    Your solution makes it easier for the machines, but harder for humans.

    I like it much more the other way round.

      Instead, I'd emphasize in comments how the source code should be viewed

      Two forms of such "comments" can actually be readable by programs:

      # for perl: use utf8; # at the end, for my favourite editor: # vim: fileencoding=utf-8
      Perl 6 - links to (nearly) everything that is Perl 6.
      Your solution makes it easier for the machines
      But you aren't getting the correct results. And since there's an additional cut-and-paste step involved, it's hard to debug from a website how your source is encoded.
Re^2: encoding question
by ikegami (Pope) on May 21, 2010 at 04:21 UTC

    \x{92} is not very readable. To support that point, I bet noone even noticed you used the wrong number. (It should be C2.) \N provides a more readable mechanism:


    It's pretty long, but shortcuts are provided and you can create your own.

      I forgot about \N characters. Thanks.
Re^2: encoding question
by choroba (Bishop) on May 20, 2010 at 08:34 UTC
    This approach gets a bit complicated when one works with texts in languages that use many letters not present in the 32..127 range, though.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://840866]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2018-02-23 12:48 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (302 votes). Check out past polls.