Perl: the Markov chain saw | |
PerlMonks |
Re: Match full utf-8 charactersby afoken (Chancellor) |
on Apr 29, 2019 at 12:51 UTC ( [id://1233111]=note: print w/replies, xml ) | Need Help?? |
echo -n 'a…b' | perl -pe 's@(.).(.)@$1$2@'; echo may generate a byte stream representing three Unicode characters, but Perl reads it as byte stream, not as Unicode characters. So you are cutting out a byte, not a character, and get back garbage. Also, perl writes out bytes, not Unicode characters. Tell perl to treat STDIN and STDOUT as Unicode character streams and everything works as expected: >echo -n 'a…b' | perl -pe 's@(.).(.)@$1$2@' a▒▒b >echo -n 'a…b' | perl -CIO -pe 's@(.).(.)@$1$2@' ab >perl -v This is perl 5, version 22, subversion 2 (v5.22.2) built for x86_64-linux-thread-multi Copyright 1987-2015, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. > See also -C in perlrun, and the thread any use of 'use locale'?, especially the subthread Re^3: any use of 'use locale'? (source encoding). Alexander
-- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
In Section
Seekers of Perl Wisdom
|
|