Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: How to remove other language character from a string

by moritz (Cardinal)
on Nov 26, 2012 at 05:27 UTC ( #1005554=note: print w/replies, xml ) Need Help??

in reply to How to remove other language character from a string

You need to use utf8; to tell Perl that your source file is in UTF-8. That way non-ASCII literal strings work the way you want them to.

use strict;
use warnings;
use 5.010;
use utf8;
binmode STDOUT, ':encoding(UTF-8)';

my $str = "ครัวซองเเซนด์วิชไข่ดาว Croissant Egg Sandwich ครัวซองเเซนด์วิชไข่ดาว";
$str =~ s/[^\p{Latin}\p{Common}]//g;
$str =~ s/^\s+|\s+$//g;
say $str;
Croissant Egg Sandwich

See also: Character Encodings in Perl.

Updated to unlinkify the brackets, and to exclude \p{Common} instead of \s from removal.

Replies are listed 'Best First'.
Re^2: How to remove other language character from a string
by Anonymous Monk on Nov 26, 2012 at 05:36 UTC
    Thanks moritz, but when I tried this I got the output like this:
    α╕α╕α╕▒α╕α&# +9557;α╕α╕α╣α╣α& +#9557;α╕α╕α╣α╕α +╕┤α╕α╣α╕α╣ +α╕α╕▓α╕ Croissant Egg Sandwich α╕α╕α╕▒α&#9557 +;α╕α╕α╕α╣α&#957 +1;α╕α╕α╕α╣α&#95 +57;α╕┤α╕α╣α╕&#9 +45;╣α╕α╕▓α╕

      That's because it wasn't formatted correctly due to missing code tags (which were presumably left out so that the input text would be shown properly). When I first ran moritz's code, I just got the original string, but when I substituted:

      $str =~ s/[^\p{Latin}\s]//g;

      for this:

      $str =~ s/^\p{Latin}\s//g;

      it worked.

      EDIT: If you have lots of extra spaces in your output, you could run it through $str =~ s/ {2,}/ /g;, too. Something to keep in mind is that moritz's approach (as is) will remove punctuation.

        It worked smoothly. Thanks Frozenwithjoy and moritz.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1005554]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2018-04-26 14:15 GMT
Find Nodes?
    Voting Booth?