Re: character encoding & french accents

I have plenty experience with mismatching character encodings, and I agree that it can be hard to track down.

First of all, where does your source document come from? Which factors determine its encoding?

Ultimately the best course of action is to ensure that you're using utf8 everywhere (and advertising that to the viewing programs). You can tell the web browser that your content is in utf8 by sending an additional "charset" header attribute like this:

  use CGI;
  my $cgi = CGI->new;

  ...

  print $cgi->header(-type=>'text/html', -charset=>'utf-8');
[download]

You can do the same for emails, by giving them the appropriate MIME headers:

Content-Type: text/plain; charset="utf-8"
[download]

How you do that depends on how your email-sending code is designed.

I'd like to point out that in recent perls (version 5.8), under most circumstances, strings are encoded internally as utf8. So it makes sense to be consistent about that in the rest of your application.

Comment on Re: character encoding & french accents Select or Download Code

Replies are listed 'Best First'.
Re^2: character encoding & french accents by dstamos (Initiate) on Feb 03, 2006 at 15:08 UTC
Thank you to rhesa, graff & fraktalisman for your constructive input. "First of all, where does your source document come from? Which factors determine its encoding?" The content comes from an input box (form) and the cgi program writes this to a text file. What happens after that i dont know. I was able to modify the script to insert a MIME header as you suggested but it did not change anything. I dont know what determines its encoding. I think im going to have to get the programmer involved here and come back with some code and more info. thank you again to everyone.	[reply]
Re^3: character encoding & french accents by rhesa (Vicar) on Feb 03, 2006 at 16:33 UTC
I think getting your developer in here to discuss the details is a very good idea. I've found on several occasions that it's necessary to manually upgrade form input to utf8. For some reason, CGI returns raw byte strings, and those might end up being upgraded to utf once more, resulting in lots of squiggly characters. I did this like so: `use Encode; my $email_body = $cgi->param( 'email_body' ); $email_body = decode_utf8( $email_body );` [download] After this, most conversions and display issues are a snap.	[reply] [d/l]


Just another Perl shrine
	PerlMonks