in reply to utf8 problems

Thanks for both answeres, yes the problem is vague, and vague to describe...
We have as string and have the internal utf8 flag switched to on:
# my example may be any string from a database /user input etc. my $example = "just a string that i nd to be utf8 encoded, but can't + see it in the chars, so guess::encode won't work"; Encode::_utf8_on($example); # ... use open ':utf8'; use open ':std'; #... my $cgi = new CGI; print $cgi->header( -type => 'text/html', -expires => '-1d', -cookie => [$cookie], -charset => 'UTF-8', ) print $example;
Now the first time it's called using mod::perl regestry it prints:
just a string that i nd to be utf8 encoded, but can't see it in the +chars, so guess::encode won't work
but the second time:
just a string that i nééd to be utf8 encoded, but can't see it in th +e chars, so guess::encode won't work
This is weird, and i have no controll over how mod perl internally stores it's values.

"We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.

Replies are listed 'Best First'.
Re^2: utf8 problems
by bpphillips (Friar) on Jun 03, 2006 at 13:18 UTC
    Two things you should check to make this example work how you're attempting.

    - Is your file UTF-8 encoded (I usually use the *NIX file command or check VI's :set fileencoding to verify this -- although there may be other ways to do this)
    - Do you have a use utf8 at the beginning of your script?

    Whenever you're using UTF-8 content within the body of your script (as you're doing in your example at least) you need to make sure you tell perl that it should use character semantics rather than byte semantics on that data. This is accomplished by placing a use utf8 within the lexical scope that you're using UTF-8 data. This also makes it unnecessary to perform the Encode::_utf8_on() operation.

    However, as noted in bold in the utf8 docs: "Do not use this pragma for anything else than telling Perl that your script is written in UTF-8". If you're retrieving data from a GET/POST parameter or from a database, it's a different story.