Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re^3: DWIM with non ASCII characters

by moritz (Cardinal)
on May 07, 2010 at 07:58 UTC ( #838885=note: print w/replies, xml ) Need Help??

in reply to Re^2: DWIM with non ASCII characters
in thread DWIM with non ASCII characters

Why use utf8;? As I understand the documentation, its purpose is to enable the source code to be in UTF-8

Yes, that way you avoid concatenating decoded and non-decoded strings.

Of course it requires your script to be actually stored in UTF-8. But since the more general solution (use encoding $your_encoding) is severly broken (wrt to AUTOLOAD, thread safety and other issues), that's currently the only sane way to store non-ASCII Perl programs.

As for the rest, I can only agree to what ikegami wrote; using IO layers is much more convenient than using encode() and decode() on every IO operation. More importantly since there are fewer spots you have to care about encoding, the probability of forgetting it somewhere (and getting Mojibake in response) is much lower.

Perl 6 - links to (nearly) everything that is Perl 6.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://838885]
[LanX]: two utf8 strings from different sources are base64 encoded, but after joining both the umlauts in teh second get deleted
[Corion]: LanX: You can't just join two base64 strings together
[LanX]: (not a high priority bug because I can use some HTML entities in the second string)
[Corion]: base64 is padded to a multiple of 4 chars (or something)
[LanX]: misunderstanding, I joined them before converting to base64
[Corion]: Also, I would be wary of encodings and try to make really sure that both input strings are UTF-8. Maybe join the input strings from one source together to see whether they decode as bad or not
[Corion]: LanX: Then the problem should persist without encoding to base64 too ;)
[LanX]: I think it's a flag problem ... I'll produce a reprodocable example for SOPW
[Corion]: "flag problem" to me sounds like "contains UTF-8 bytes but was never properly decoded to an UTF-8 string"

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (11)
As of 2017-01-16 13:55 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (150 votes). Check out past polls.