Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Is utf8, ascii ?

by clinton (Priest)
on Aug 07, 2007 at 19:01 UTC ( #631116=note: print w/replies, xml ) Need Help??


in reply to Is utf8, ascii ?

From the core utf8 module, you can use:

utf8::valid($string)

But presumably, you don't want to just discard data, Instead, you want to convert it to UTF8 and insert it safely. If you know what character set it is in, then use Encode to convert it. Otherwise, as you have done, you can use Encode::Guess to try to figure out what character set it is first.

Clint

Replies are listed 'Best First'.
Re^2: Is utf8, ascii ?
by rootcho (Pilgrim) on Aug 07, 2007 at 19:38 UTC
    I see.
    I'm new to these encode stuff, but now I understand... check, guess try to encode, if not discard.
    At the moment I want just to discard, later when I have time will do more tests
    But my next question was... if I check for valid utf8 string and discard. Will this discard the string if it is ascii ?
      No. U+0000 to U+007F (the first 128 Unicode characters) are represented in UTF8 by one byte - the same byte that is used in ASCII. So ASCII (7 bit ASCII, not eg ISO-8859-* or WINDOWS-1252) is a subset of UTF8.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://631116]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2020-12-01 22:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How often do you use taint mode?





    Results (23 votes). Check out past polls.

    Notices?