Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: character encoding & french accents

by rhesa (Vicar)
on Feb 02, 2006 at 21:45 UTC ( [id://527460]=note: print w/replies, xml ) Need Help??


in reply to character encoding & french accents

I have plenty experience with mismatching character encodings, and I agree that it can be hard to track down.

First of all, where does your source document come from? Which factors determine its encoding?

Ultimately the best course of action is to ensure that you're using utf8 everywhere (and advertising that to the viewing programs). You can tell the web browser that your content is in utf8 by sending an additional "charset" header attribute like this:

use CGI; my $cgi = CGI->new; ... print $cgi->header(-type=>'text/html', -charset=>'utf-8');
You can do the same for emails, by giving them the appropriate MIME headers:
Content-Type: text/plain; charset="utf-8"
How you do that depends on how your email-sending code is designed.

I'd like to point out that in recent perls (version 5.8), under most circumstances, strings are encoded internally as utf8. So it makes sense to be consistent about that in the rest of your application.

Replies are listed 'Best First'.
Re^2: character encoding & french accents
by dstamos (Initiate) on Feb 03, 2006 at 15:08 UTC
    Thank you to rhesa, graff & fraktalisman for your constructive input. "First of all, where does your source document come from? Which factors determine its encoding?" The content comes from an input box (form) and the cgi program writes this to a text file. What happens after that i dont know. I was able to modify the script to insert a MIME header as you suggested but it did not change anything. I dont know what determines its encoding. I think im going to have to get the programmer involved here and come back with some code and more info. thank you again to everyone.
      I think getting your developer in here to discuss the details is a very good idea.

      I've found on several occasions that it's necessary to manually upgrade form input to utf8. For some reason, CGI returns raw byte strings, and those might end up being upgraded to utf once more, resulting in lots of squiggly characters. I did this like so:

      use Encode; my $email_body = $cgi->param( 'email_body' ); $email_body = decode_utf8( $email_body );
      After this, most conversions and display issues are a snap.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://527460]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-24 10:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found