Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^4: Perl detect utf8, iso-8859-1 encoding

by swiftlet (Acolyte)
on Jul 24, 2020 at 15:07 UTC ( [id://11119765]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Perl detect utf8, iso-8859-1 encoding
in thread Perl detect utf8, iso-8859-1 encoding

thanks for your code, sorry I didn't explain the problem clear enough.

The input could be encoded in iso-8859-1 \x{f6}\x{f6}, or, maybe in utf-8, \x{c3}\x{b6}, I have to find out what is the charset first.

Encode::Detect::Detector is the one I am using to find out what is the charset of the string, utf-8 or iso-8859-1,

the logic is like: $charset = = Encode::Detect::Detector::detect($input); if($charset eq 'UTF-8'){ # do NFC ... }elsif($charset eq 'iso-8859-1'){ # do NFD ... }
Text::Unaccent unac_string($charset, $str) in my case. Text::Unaccent is working well if Detector can find it the correct code, it failed if Detector failed, of course, no charset.

Encode::Detect::Detector normally working well, but failed if input = \x{f6}\x{f6}.

Replies are listed 'Best First'.
Re^5: Perl detect utf8, iso-8859-1 encoding
by jeffenstein (Hermit) on Jul 24, 2020 at 20:06 UTC

    Ah, oops. My fault there. You did explain it well, but I wasn't paying enough attention. I'll leave as is, just in case someone finds it useful one day

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11119765]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-06-17 10:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.