Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: encoding question

by JavaFan (Canon)
on May 20, 2010 at 07:54 UTC ( #840866=note: print w/ replies, xml ) Need Help??


in reply to encoding question

The substitution should happen regardless whether the string has the UTF-8 flag set or not.

However, what is important is how the "'A with circumflex" is encoded in the source code, and what Perl thinks the encoding is.

To avoid problems, I always try to not have any characters with code points over 127 in my source code, and specially avoid the code points 128-255. If I were to write the line, I would write it as:

my $str = "Th\x{92}t \x{92}pple";
that should work regardless whether perl thinks my source code is in UTF-8 format or not. (It may still get confused if it thinks my source code is written in EBCDIC, but that's no worry for me).


Comment on Re: encoding question
Download Code
Re^2: encoding question
by choroba (Abbot) on May 20, 2010 at 08:34 UTC
    This approach gets a bit complicated when one works with texts in languages that use many letters not present in the 32..127 range, though.
Re^2: encoding question
by Krambambuli (Deacon) on May 20, 2010 at 08:56 UTC
    I don't really like that, sorry.

    If someone else will have to look on the code, (and just let's suppose you'll have more such characters following each other in the source,... ), 'reading' what's supposed to be there will be a chore.

    Instead, I'd emphasize in comments how the source code should be viewed _and_ what exactly should be seen.

    Your solution makes it easier for the machines, but harder for humans.

    I like it much more the other way round.


    Krambambuli
    ---
      Instead, I'd emphasize in comments how the source code should be viewed

      Two forms of such "comments" can actually be readable by programs:

      # for perl: use utf8; # at the end, for my favourite editor: # vim: fileencoding=utf-8
      Perl 6 - links to (nearly) everything that is Perl 6.
      Your solution makes it easier for the machines
      But you aren't getting the correct results. And since there's an additional cut-and-paste step involved, it's hard to debug from a website how your source is encoded.
Re^2: encoding question
by ikegami (Pope) on May 21, 2010 at 04:21 UTC

    \x{92} is not very readable. To support that point, I bet noone even noticed you used the wrong number. (It should be C2.) \N provides a more readable mechanism:

    use charnames ':full'; my $str = "Th\N{LATIN CAPITAL LETTER A WITH CIRCUMFLEX}t \N{LATIN CAPI +TAL LETTER A WITH CIRCUMFLEX}pple";

    It's pretty long, but shortcuts are provided and you can create your own.

      I forgot about \N characters. Thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://840866]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2014-09-19 06:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (131 votes), past polls