Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Regexp and Linux (is it utf issue?)

by aitap (Deacon)
on Jun 28, 2013 at 06:21 UTC ( #1041153=note: print w/ replies, xml ) Need Help??


in reply to Regexp and Linux (is it utf issue?)

When you read text files, you should decode them. This is easy using PerlIO layers, Encode module and three-argument form of open:

use Encode; open my $fh, "<:encoding(whatever)", $filename or die $!;
This way, Perl decodes everything automatically, and you only have to work with characters, not bytes.

When you write text to files, writing characters produces the famous warning: "wide character in (sub name)...". You need to encode them using the same technique: open my $write, ">:encoding(whatever)", $filename or die $!;. You can use :utf8 layer to encode characters because they are internally stored as valid UTF-8.

Do not use :utf8 iolayer to decode text because it simply sets "character" flag on the strings read from filehandles without any checks and this is generally unsafe: UTF8 related proof of concept exploit released at T-DOSE.


Comment on Re: Regexp and Linux (is it utf issue?)
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1041153]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (15)
As of 2014-10-30 13:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (208 votes), past polls