Re: utf8 encoding bug?

by zemplen (Novice)
on Feb 25, 2003 at 17:50 UTC

in reply to utf8 encoding bug?

This fixed one and broke another
sub test2 () { print "-"x20, "\n"; print $_[0], "\n"; print "Before = ", unpack ("U*", ${text}), "\n\n"; my $tmp = ${dcy[1]}; $text=~s/${a_lc_diaeresis[1]}/$tmp/g; $tmp = ${acy[1]}; $text=~s/${a_lc_grave[1]}/$tmp/g; print "After = ", unpack ("U*", ${text}), "\n\n"; print $text, "\n\n"; }

Re: Re: utf8 encoding bug?
by hv (Parson) on Feb 25, 2003 at 18:49 UTC

    Ah sorry, I didn't think to check the original test again.

    At least some of the problems are occurring because of bugs in perl-5.8.0 when upgrading a non-utf8 string to utf8 at odd times, so the easiest workaround I could find was to force every string to be upgraded before messing with it. I'm not sure what your Encode::encode_utf8() calls are supposed to be doing - they don't appear to be having any effect - but if I replace each of them with a force_utf8($text) using the definition below all tests appear to do the right thing:

    sub force_utf8 { chop( $_[0] .= "\x{100}" }; }

    Since your original test already works correctly under the latest development sources, I am confident that the next maintenance release (ie 5.8.1) will also include the fix.


[ELISHEVA]: the data source, or one of them, is the OECD - they provide a *lot* of data that ought to be easily available to perl programmers.
[erix]: it might be cunning to mention the module in the title... :)
[ELISHEVA]: fancy that - a title that actually describes the problem :-)
[ELISHEVA]: but actually thanks for the reminder
[Discipulus]: DBI::CSV + utf8 = BOO?M
[erix]: in extremis we tend to forget stuff ;)
[ELISHEVA]: \Disciplus : lol
[Discipulus]: and ELISHEVA we waait one your post since ~2years... ;=)
[ELISHEVA]: has it really been that long?
[Discipulus]: it seems..

