Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Encoding problems

by Nik
on May 17, 2006 at 21:17 UTC ( #550092=perlquestion: print w/ replies, xml ) Need Help??
Nik has asked for the wisdom of the Perl Monks concerning the following question:

ok, encodign driving me nuts!
my webpage is encoded in utf-8 and when i run nikos.no-ip.org i see all greek correctly except the contents of drop down menu and tha timedate.
The contents of the drop down menu correspond to actuall greek filanames written as utf-8 files and with unix file line encodings.

I just dotn get it why it wont be displayed correctly as greek like all the rest of the page.....

Comment on Encoding problems
Re: Encoding problems
by spiritway (Vicar) on May 17, 2006 at 21:37 UTC

    I can't check it out myself, but my guess would be that you need to adjust the 'enctype' of your form.

      That's not what a form's enctype is for!

      -sam

Re: Encoding problems
by samtregar (Abbot) on May 17, 2006 at 21:45 UTC
    Can you show us the code that produced the data in the select box? My browser doesn't think it's valid UTF-8, or at least it's not UTF-8 for characters I have a font to display. The rest of the page looks Greek to me (hahah).

    -sam

      Here si the code that procudes and fills the contents of the drop down menu:
      my @files = <../data/text/*.txt>; my @display_files = map /([^\/]+)\.txt/, @files; print br; print start_form( action=>'index.pl' ); print h1( {class=>'lime'}, "&#917;&#960;&#941;&#955;&#949;&#958;&# +949; &#964;&#959; &#954;&#949;&#943;&#956;&#949;&#957;&#959; &#960;&# +959;&#965; &#963;&#949; &#949;&#957;&#948;&#953;&#945;&#966;&#941;&#9 +61;&#949;&#953; => ", popup_menu( -name=>'select', -values=> +\@display_files ), submit('&#917;&#956;&#966;&#940;&#957; +&#953;&#963;&#951;'));
      As Ani from efnet told me maybe the OS i use (xp from MicroSuck) is so damn persisting on creating utf-8 file contents buu saving the filenames as iso-8859-7 instead!!!
      What do you think folks?
        Are you sure the rest of the data on your page is really UTF-8? Those "&#XXX" things you're outputting are HTML entities, NOT UTF-8.

        Also, what version of Perl are you using?

        -sam

        Actually, looking closer, I think your friend is right. If I tell Firefox to interpret the page as "Greek ISO-8859-7" then the select box looks a lot like a list of Greek words. I can't read Greek so it's hard to know for sure!

        Try using something like Unicode::Map8 (old-school) or Encode (new-school) to go from ISO-8859-7 to UTF-8.

        -sam

Re: Encoding problems
by graff (Chancellor) on May 18, 2006 at 01:14 UTC
    If I understand the discussion so far, you are putting utf8 character data into the html, but you are loading file name strings with iso-8859-7 characters into a popup menu.

    So you just need to do one of the following (not both): (1) use iso-8859-7 as the text data for your html content (not utf8), and expect client browsers to use that encoding when diplaying the page; or (2) convert the file name strings to utf8 before you load them into the popup menu. Either way, the point is to make sure all the page content (text and form data) use the same encoding, and either way, Encode is the easiest to use. Here's how to convert the file names strings to utf8:

    use Encode; ## add this line near the top my @files = <../data/text/*.txt>; my @display_files = map { /([^\/]+)\.txt/; decode('iso-8859-7', $1) } +@files;
    In other words, just add the "decode" call to your map block; it returns the utf8 version of the 8859-7 string stored in its second arg. I don't think anything else needs to change in your code (unless you have other 8859 strings coming from other places that you haven't shown us).
      Wellt hanks fro answerign i hoepd i ahve seen you answer before but i saw it just now. Well finally my aproach was that:
      my @files = <../data/text/*.txt>; my @display_files = map /([^\/]+)\.txt/, @files; Encode::from_to($_, "ISO-8859-7", "utf8") for @display_files; print br; print start_form( action=>'index.pl' ); print h1( {class=>'lime'}, "&#917;&#960;&#941;&#955;&#949;&#958;&# +949; &#964;&#959; &#954;&#949;&#943;&#956;&#949;&#957;&#959; &#960;&# +959;&#965; &#963;&#949; &#949;&#957;&#948;&#953;&#945;&#966;&#941;&#9 +61;&#949;&#953; => ", popup_menu( -name=>'select', -values=> +\@display_files ), submit('&#917;&#956;&#966;&#940;&#957; +&#953;&#963;&#951;')); print end_form; my $passage = param('select') || "&#913;&#961;&#967;&#953;&#954;&#942; + &#931;&#949;&#955;&#943;&#948;&#945;!"; Encode::from_to($passage, "utf8", "ISO-8859-7") if param(); if ( param('select') ) { open(FILE, "<../data/text/$passage.txt") or die $!; .....
      Because windows are unable of saving greek filenames also in utf8 format(they save only the file contents iam afraid) i ahev to do this tediouts tranformation encodign task very often and in several instanced of my code. In the abbove example i transform grom greek => utf8 the fialen string so thay can appear correctly in the cleints browser and also then when the cleint select something uppon submit i have to *also* tranfrom back the encoded string from utf8 => greek because as you can see immediately after the user selected file must be opened.

      This become even more tedious when i try to also convert other greek text to utf8 othweriwse it will not appear correctly.look this foe example:
      use POSIX qw(strftime); use Encode; print header( -charset=>'utf8' ); print start_html( -style=>'/data/css/style.css', -title=>'&#936;&#965;&#967;&#969;&#966;&#949;&#955;& +#942; &#928;&#957;&#949;&#965;&#956;&#945;&#964;&#953;&#954;&#940; &# +922;&#949;&#943;&#956;&#949;&#957;&#945;!' ); my ($select, $row, $data); my $date = strftime('%y-%m-%d %H:%M:%S', localtime); my $display_date = strftime('%a %d %b, %I:%M %p', localtime); Encode::from_to($display_date, 'ISO-8859-7', 'utf8'); my $host = gethostbyaddr (pack ("C4", split (/\./, $ENV{'REMOTE_ADDR'} +)), 2) || $ENV{REMOTE_ADDR}; $host = "&#925;&#943;&#954;&#959;&#962;" if ( ($host =~ /dell/) or ($h +ost =~ /dsldevice/) or ($host =~ /localhost/) );
      #date its like this because mysql needs it to be in thsi specific outline in order to insert the date into its database date field. #as for display_date this not only needs steaming but also need encoding to utf8 form otherwise in my index.pl iam not able to see the date correctly(only the number) all other date strigns to question marks.

      Today i saw that another elementt of my script needs transormation to uft8. This has become a very bery tedious and annoying work to do and i ahve to repeat thsi to other scripts as well.

      I would be very gratefull if you guys find a way that i will get rid of theose tranformation and all page contect be default inputs and output as utf8.

      Thank you.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://550092]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2014-09-16 01:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (155 votes), past polls