in reply to Re: text encodings and perl
in thread text encodings and perl
Just like concatenation imposes string context, and multiplication numeric context, print imposes "binary" context (and encodes and warns if necessary), and uc imposes "character" context (and decoes with Latin-1 if the string holds undecoded octests).
Well. My point was different. It is correct that perl does certain conversions behind the stage, and certain warnings are given out because perl has to produce the result. But my point was, that without the help of the developer, perl can not do 100% correct work. It just does what works most of the time. The context is imposed, but if the string is not in proper internal form, then "characters" that perl works with might be quite wrong from the developer's stand point.
I can give you examples of bad confusion that I had in mind.
Module MP3::Tag::ID3v2 provides method "get_frame" which returns string as sequence of octets. So to convert the encoding developer has to use "Encode::from_to". But the method "change_frame" of the same module expects string in "internal form" because internally it calls Encode::encode on the input. So the developer can't pass
the string returned by "get_frame" as input to "change_frame" unless he calls "Encode::decode" on it.
Another example. The DBD modules may return strings from databases either as octets or in "internal form". But if you pass these strings to say Gtk2 modules, then they must be only in "internal form". So the developer have to execute care what kind of output he/she gets from the DBD modules.
I believe, that part of the confusion lays in the badly written modules. Since perl provides function "is_utf8", it is very easy to check what kind of input the user has provided and use appropriate "Encode::encode" or "Encode::decode" to get the desired form. But we have what we have, so the developers have to watch out for the type of strings they work with.