Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: The Unicode Bug with Transliteration or Substitution

by Anonymous Monk
on May 03, 2014 at 00:11 UTC ( [id://1084858]=note: print w/replies, xml ) Need Help??


in reply to The Unicode Bug with Transliteration or Substitution

It could be :) so you have sample data to play with?

Have you tried utf8::upgrade($string)? Maybe you can try Unicode::Semantics

  • Comment on Re: The Unicode Bug with Transliteration or Substitution

Replies are listed 'Best First'.
Re^2: The Unicode Bug with Transliteration or Substitution
by choroba (Cardinal) on May 03, 2014 at 20:07 UTC
    You can use the Japanese Wikipedia Perl page . Perl 5.8.3 at work outputs different files for
    tr/ / /s; tr/\t/ /s;
    and

    s/ +/ /g; s/\t+/ /g;

    I tested with diff -w against the original, i.e. ignoring whitespace.

    utf8::upgrade didn't change anything, before or after the substitution/transliteration.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1084858]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2024-04-20 05:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found