jkahn has asked for the wisdom of the Perl Monks concerning the following question:
I've been trying to get utf-8 encoded files to read in properly, and to parse with character semantics after loading. It seems to me that the first two printouts should be the same, but instead the one loaded from the file while the utf8 pragma was in scope (line 2) is handling length wrong, or so it appears.
Note it wasn't funny ampersands in the data, but an actual utf-8 character (the upside down e, U+0259 LATIN SMALL LETTER SCHWA). (darn conversions!)#!perl -w use warnings; use strict; { use utf8; my $string = 'ə'; # this is a schwa in UTF-8, darned handy in linguistics print length $string,"\t",$string, "\n"; my $filestring = <DATA>; chomp $filestring; print length $filestring, "\t", $filestring, "\n"; # seems like it should print "1" here... but it prints 2! } { my $string = 'ə'; print length $string,"\t",$string, "\n"; my $filestring = <DATA>; chomp $filestring; print length $filestring, "\t", $filestring, "\n"; } __DATA__ ə ə
Here's the results (as pre):
1 ə 2 ə 2 ə 2 əIt's the second line that really surprises me... shouldn't that be a '1'? The only apparent difference is that it was read off a filehandle. How can I "reset" that data to be utf8?
Here's my version of Perl (I used pre tags so that d/l code would work!):
C:\>perl -v This is perl, v5.6.1 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2001, Larry Wall Binary build 633 provided by ActiveState Corp. http://www.ActiveState.com Built 21:33:05 Jun 17 2002 Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using `man perl' or `perldoc perl'. If you have access to the Internet, point your browser at http://www.perl.com/, the Perl Home Page.Anybody have any idea what's wrong here or why it gets the length wrong?
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Setting UTF-8 mode on filehandle reads?
by grantm (Parson) on Dec 06, 2002 at 01:14 UTC | |
by ph0enix (Friar) on Dec 06, 2002 at 12:32 UTC | |
by grantm (Parson) on Dec 06, 2002 at 18:14 UTC | |
Re: Setting UTF-8 mode on filehandle reads?
by diotalevi (Canon) on Dec 06, 2002 at 01:03 UTC | |
Re: Setting UTF-8 mode on filehandle reads?
by pg (Canon) on Dec 06, 2002 at 15:42 UTC | |
by grantm (Parson) on Dec 07, 2002 at 01:47 UTC | |
by Anonymous Monk on Dec 20, 2012 at 19:30 UTC |
Back to
Seekers of Perl Wisdom