Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: Windows-1252 characters from \x{0080} thru \x{009f}

by graff (Chancellor)
on Apr 19, 2012 at 02:01 UTC ( #965830=note: print w/ replies, xml ) Need Help??

in reply to Windows-1252 characters from \x{0080} thru \x{009f}

tye has covered most of the important stuff. I'll just add that in order for your first code snippet to DWYM, it would have to go something like this (note the addition of "use Encode", setting the io layer on STDOUT, and applying "decode" to the literals being assigned to @words):

#!perl use strict; use warnings; use Encode; binmode STDOUT, ":encoding(cp1252)"; my $pattern = qr/\A\w+\z/; my @words = map { decode( "cp1252", $_ ) } qw( Tšekissä Žena Śdipus +Rex ); for my $word (@words) { my $result = $word =~ $pattern ? "matches" : "doesn't match"; printf qq/The word "%s" %s the pattern %s\n/, $word, $result, $pat +tern; }
When I run that in a terminal that is using cp1252 (aka "Windows Latin1"), the resulting output is:
The word "Tšekissä" matches the pattern (?-xism:\A\w+\z) The word "Žena" matches the pattern (?-xism:\A\w+\z) The word "Śdipus" matches the pattern (?-xism:\A\w+\z) The word "Rex" matches the pattern (?-xism:\A\w+\z)
UPDATE: To clarify, the point here is that when it comes to matching things outside the ASCII range, regex expressions like '\w' will only employ unicode semantics, not cp1252 or any other semantics, so they need to operate on strings that have their perl-internal-utf8 flag set to true (i.e. have been decoded from "external" forms, whether by reading through the appropriate io layer, or by explicit decoding).

Comment on Re: Windows-1252 characters from \x{0080} thru \x{009f}
Select or Download Code
Re^2: Windows-1252 characters from \x{0080} thru \x{009f}
by Jim (Curate) on Apr 19, 2012 at 05:34 UTC

    Thank you very much, graff. Your reply filled in the all-import How-do-you-do-it? gap.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://965830]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2014-07-14 02:20 GMT
Find Nodes?
    Voting Booth?

    When choosing user names for websites, I prefer to use:

    Results (254 votes), past polls