Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: utf weirdness in regex

by borisz (Canon)
on Jul 23, 2004 at 08:24 UTC ( [id://376828]=note: print w/replies, xml ) Need Help??


in reply to utf weirdness in regex

Using decode here is very wrong. Decode is if you have a sequence that is in utf8, but perl does not know it. Your's is in latin1 and it does not convert to valid utf8. retry it with
$string1 = Encode::decode(utf8 => $string1, Encode::FB_CROAK);
to convert all to valid unicode, try:
$string1 = Encode::decode(latin1 => $string1, Encode::FB_CROAK); $string2 = Encode::decode(latin1 => $string2, Encode::FB_CROAK); $string3 = Encode::decode(latin1 => $string3, Encode::FB_CROAK);
Boris

Replies are listed 'Best First'.
Re^2: utf weirdness in regex
by december (Pilgrim) on Jul 24, 2004 at 04:35 UTC

    Thanks, that looks a lot more like what I expected!

    In the future, I will use the CHECK argument to see if something went wrong in the conversion. I hope that will lift some of my initial confusion as to what is in which charset...

      Of course, when using FB_CROAK as the CHECK argument, you normally want to wrap it in an eval:
      my $encoding = "whatever"; my $octets = "characters in whatever encoding..."; eval '$_ = decode( $encoding, $octets, Encode::FB_CROAK )'; if( $@ ) { report_an_error(); ... }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://376828]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-25 07:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found