<?xml version="1.0" encoding="windows-1252"?>
<node id="1005554" title="Re: How to remove other language character from a string" created="2012-11-26 00:27:25" updated="2012-11-26 00:27:25">
<type id="11">
note</type>
<author id="616540">
moritz</author>
<data>
<field name="doctext">
&lt;p&gt;You need to &lt;c&gt;use &lt;/c&gt;[doc://utf8]&lt;c&gt;;&lt;/c&gt; to tell Perl that your source file is in UTF-8. That way non-ASCII literal strings work the way you want them to.

&lt;pre&gt;
use strict;
use warnings;
use 5.010;
use utf8;
binmode STDOUT, ':encoding(UTF-8)';

my $str = "&amp;#3588;&amp;#3619;&amp;#3633;&amp;#3623;&amp;#3595;&amp;#3629;&amp;#3591;&amp;#3648;&amp;#3648;&amp;#3595;&amp;#3609;&amp;#3604;&amp;#3660;&amp;#3623;&amp;#3636;&amp;#3594;&amp;#3652;&amp;#3586;&amp;#3656;&amp;#3604;&amp;#3634;&amp;#3623; Croissant Egg Sandwich &amp;#3588;&amp;#3619;&amp;#3633;&amp;#3623;&amp;#3595;&amp;#3629;&amp;#3591;&amp;#3648;&amp;#3648;&amp;#3595;&amp;#3609;&amp;#3604;&amp;#3660;&amp;#3623;&amp;#3636;&amp;#3594;&amp;#3652;&amp;#3586;&amp;#3656;&amp;#3604;&amp;#3634;&amp;#3623;";
$str =~ s/&amp;#91;^\p{Latin}\p{Common}]//g;
$str =~ s/^\s+|\s+$//g;
say $str;
__END__
Croissant Egg Sandwich
&lt;/pre&gt;



&lt;p&gt;See also: [http://perlgeek.de/en/article/encodings-and-unicode|Character Encodings in Perl].

&lt;p&gt;&lt;b&gt;Updated&lt;/b&gt; to unlinkify the brackets, and to exclude \p{Common} instead of \s from removal.

&lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-616540"&gt;
[http://perl6.org/|Perl 6 - the future is here, just unevenly distributed]
&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
1005553</field>
<field name="parent_node">
1005553</field>
</data>
</node>
