Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: Decoding, Encoding string, how to?

by ikegami (Pope)
on Apr 03, 2009 at 08:50 UTC ( #755203=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Decoding, Encoding string, how to?
in thread Decoding, Encoding string, how to?

I see question marks, but I'm not sure if there's a question in there. You seem to have a good grasp of the concept.

if a want to print using iso-8859-1 it could be possible downgrading

You'd get the right result, at the cost of confusing your readers. You'd be saying you're doing one thing (changing the internal format) while actually doing another (changing the encoding of the string).

I see that it's not representing in the iso-8859-1 charset

Correct, iso-8859-1 cannot encode U+201C. cp1252 can. cp1252 is Microsoft's extension of iso-8859-1. It's a commonly used encoding in the Windows world, which is why U+201C is encountered frequently.

we can check if it's downgradable or not using utf8::downgrade($str, 1)

Indeed. I have used that very code to make sure a sub was only given bytes before calling a function that expects to only get bytes. At the same time, it makes sure the bytes aren't internally encoded as UTF-8. Most XS functions can't handle that (which is really a bug in the XS function).

using Encode::encode the unmapped character is printed as an ? question symbol and not report any notice

How encode handles errors is configurable using its third parameter.


Comment on Re^3: Decoding, Encoding string, how to?
Select or Download Code
Re^4: Decoding, Encoding string, how to?
by way (Sexton) on Apr 03, 2009 at 18:23 UTC
    You'd get the right result, at the cost of confusing your readers. You'd be saying you're doing one thing (changing the internal format) while actually doing another (changing the encoding of the string).

    Yes, my example is for understand the basic step of how perl works internally and what I can obtain handle it, in the mayor part it is a theory test, because it must be handle usually with functions like Encode::encode

    Correct, iso-8859-1 cannot encode U+201C. cp1252 can. cp1252 is Microsoft's extension of iso-8859-1, and it's commonly used encoding in the Windows world. That's why U+201C is encountered frequently.

    You right, i really don't knew it but checking the cp1252 i can see the character 201C.

    Here you have http://en.wikipedia.org/wiki/Windows-1252

    Indeed. I have used that very code to make sure a sub was only given bytes before calling a function that expects to only get bytes. At the same time, it makes sure the bytes aren't internally encoded as UTF-8. Most XS functions can't handle that (which is really a bug in the XS function).

    That's really important, i'll take mental note, of common error using XS functions.

    You're very helpful to undestand this topic, it's small but clear thank you again

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://755203]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (8)
As of 2014-10-23 04:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (124 votes), past polls