Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: How to remove other language character from a string

by moritz (Cardinal)
on Nov 26, 2012 at 05:27 UTC ( #1005554=note: print w/ replies, xml ) Need Help??


in reply to How to remove other language character from a string

You need to use utf8; to tell Perl that your source file is in UTF-8. That way non-ASCII literal strings work the way you want them to.

use strict;
use warnings;
use 5.010;
use utf8;
binmode STDOUT, ':encoding(UTF-8)';

my $str = "ครัวซองเเซนด์วิชไข่ดาว Croissant Egg Sandwich ครัวซองเเซนด์วิชไข่ดาว";
$str =~ s/[^\p{Latin}\p{Common}]//g;
$str =~ s/^\s+|\s+$//g;
say $str;
__END__
Croissant Egg Sandwich

See also: Character Encodings in Perl.

Updated to unlinkify the brackets, and to exclude \p{Common} instead of \s from removal.


Comment on Re: How to remove other language character from a string
Select or Download Code
Replies are listed 'Best First'.
Re^2: How to remove other language character from a string
by Anonymous Monk on Nov 26, 2012 at 05:36 UTC
    Thanks moritz, but when I tried this I got the output like this:
    α╕α╕α╕▒α╕α&# +9557;α╕α╕α╣α╣α& +#9557;α╕α╕α╣α╕α +╕┤α╕α╣α╕α╣ +α╕α╕▓α╕ Croissant Egg Sandwich α╕α╕α╕▒α&#9557 +;α╕α╕α╕α╣α&#957 +1;α╕α╕α╕α╣α&#95 +57;α╕┤α╕α╣α╕&#9 +45;╣α╕α╕▓α╕

      That's because it wasn't formatted correctly due to missing code tags (which were presumably left out so that the input text would be shown properly). When I first ran moritz's code, I just got the original string, but when I substituted:

      $str =~ s/[^\p{Latin}\s]//g;

      for this:

      $str =~ s/^\p{Latin}\s//g;

      it worked.

      EDIT: If you have lots of extra spaces in your output, you could run it through $str =~ s/ {2,}/ /g;, too. Something to keep in mind is that moritz's approach (as is) will remove punctuation.

        It worked smoothly. Thanks Frozenwithjoy and moritz.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1005554]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (14)
As of 2015-07-28 18:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (258 votes), past polls