Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

help needed in file encoding

by uva (Sexton)
on Mar 20, 2006 at 16:10 UTC ( [id://537977]=perlquestion: print w/replies, xml ) Need Help??

uva has asked for the wisdom of the Perl Monks concerning the following question:

dear monks,
i can find whether the particular string is utf8 or not by means of utf8::is_utf8();
but how to detect which encoding is used for the particular file. in order to check if the file is encoded in utf8 or utf16 or someother format.

Edited by planetscape - fixed br tags

Replies are listed 'Best First'.
Re: help needed in file encoding
by zentara (Cardinal) on Mar 20, 2006 at 16:37 UTC
Re: help needed in file encoding
by ikegami (Patriarch) on Mar 20, 2006 at 17:54 UTC

    You could check if the first few bytes contains a BOM.

    if (substr($text, 0, 4) eq "\x00\x00\xFE\xFF") { utf32be } elsif (substr($text, 0, 4) eq "\xFF\xFE\x00\x00") { utf32le } elsif (substr($text, 0, 2) eq "\xFE\xFF" ) { utf16be } elsif (substr($text, 0, 2) eq "\xFF\xFE" ) { utf16le } elsif (substr($text, 0, 3) eq "\xEF\xBB\xBF" ) { utf8 } else { No BOM found. Might not be UTF. Use another method to guess the encoding. }

    It's not very reliable. The protocol which encases your stream/data must really specify the encoding for things to work smoothly.

      Consider File::BOM if you need to do something like this.
        i tried with utf8::is_utf8($string) ,the $string is got from the big5 encoded file,i got the output as 1(true). how does it happen . Is all internal representation is utf8 in windows?.
Re: help needed in file encoding
by timos (Beadle) on Mar 20, 2006 at 16:45 UTC
    In Linux: wc -m tells you the number of characters, wc -c the numer of bytes. If the last one is greater then the first one it is probably UTF-8/16 encoded.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://537977]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (1)
As of 2025-01-13 09:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (28 votes). Check out past polls.