Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^7: convert files to ansi (8859-1)

by Corion (Patriarch)
on Mar 29, 2017 at 08:40 UTC ( [id://1186338]=note: print w/replies, xml ) Need Help??


in reply to Re^6: convert files to ansi (8859-1)
in thread convert files to ansi (8859-1)

Every file is valid ISO-8859-1, because ISO-8859-1 is a single-byte encoding.

Replies are listed 'Best First'.
Re^8: convert files to ansi (8859-1)
by Yaerox (Scribe) on Mar 29, 2017 at 08:42 UTC
    Well, that explains alot ... so I need to look for another way to validate the encoding. Is there any known way to do this?

    I read about Encode::Guess, maybe I have to take a look on it?

      My approach to guessing the encoding would be to look for well-known phrases/trigrams. For example, if you know the language of the text, look for trigrams (or longer sequences) that indicate the encoding.

      "über" would be a good German word which commonly (enough) appears in the text and if you get

      "\xFCber" # ANSI / Latin-1 "\xC3\xBCber" # UTF-8
        We decided to develop a bytewise reader and converter. We need such an algorithm on multiple places anyway. Putting the effort in seems for us the most productive way now ...

        Thanks alot.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1186338]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-19 14:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found