|Perl: the Markov chain saw|
Problems with packing upper ASCII - differences across perl versionsby JayBonci (Curate)
|on Nov 29, 2002 at 11:48 UTC||Need Help??|
JayBonci has asked for the
wisdom of the Perl Monks concerning the following question:
Good day everyone. I've run into a problem with Unicode characters with perl5.6.1.
I'm trying to escape arbitrary ascii characters that people give me through various web forms. These are in turn passed back to the browser and inside of various XML constructs. Inside of those constructs, I need to have UTF-8 compliant stuff.
For instance, I'd like to make:
õ become &3245;I figure the best way to do that would be to pack the string. Take this code snippet for example:
Under perl5.6.x, I receive:
jaybonci@willowisp:~/perl$ ./pack.pl Malformed UTF-8 character (1 byte, need 4) in unpack at ./pack.pl line 9. Malformed UTF-8 character (1 byte, need 4) in unpack at ./pack.pl line 9.However, under perl 5.8, I recieve the proper output:
Checking perldelta, it mentions changes and improved support for Unicode inside of perl5.8.0, but it strikes me that I don't see how the "U" template of pack would even work under 5.6.1
Griping aside, my first reaction to solve this would be to pack out the characters to 32 bits a piece (I think that is what the warning is getting at). It also occurs to me that the code above sort of works if you pack against "C", but for limited use. With the euro symbol on either platform (€ or €), neither pack sequence seems to buffer out the bits to be the right way.
So my questions are:
Thanks a bunch for any help. I'm beating my head against this one.