Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: tr{}{} doesn't wanna work.. what am I doing wrong?

by Eliya (Vicar)
on Feb 24, 2012 at 14:19 UTC ( [id://955926]=note: print w/replies, xml ) Need Help??


in reply to Re: tr{}{} doesn't wanna work.. what am I doing wrong?
in thread tr{}{} doesn't wanna work.. what am I doing wrong?

tr and Unicode don't mix well

In what way?  Seems to work fine for me.  Could you provide an example that fails? (just curious)

use Devel::Peek; my $test = "\x{2345}\x{3456}"; Dump $test; $test =~ tr/\x{2345}\x{3456}/XY/; Dump $test; __END__ SV = PV(0x768bc8) at 0x7907d8 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x782630 "\342\215\205\343\221\226"\0 [UTF8 "\x{2345}\x{3456}"] CUR = 6 LEN = 8 SV = PV(0x768bc8) at 0x7907d8 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x782630 "XY"\0 [UTF8 "XY"] CUR = 2 LEN = 8

(I'm only using \x{...} here because PM code sections don't support Unicode — it works the same way with a UTF-8 encoded source file when using "use utf8;")

Replies are listed 'Best First'.
Re^3: tr{}{} doesn't wanna work.. what am I doing wrong?
by moritz (Cardinal) on Feb 24, 2012 at 14:43 UTC
    In what way?

    By not supporting Unicode-aware character classes, and listing all Unicode characters in a certain category is a usually a moot endeavor.

    The OP is the best example: it doesn't list all accented characters that could be ASCIIfied.

      By not supporting Unicode-aware character classes

      Well, tr/// doesn't support character classes in general (only certain kind of ranges), so this is not specifically a Unicode problem, but a feature of tr///.   (I'd agree if you had said "tr and character classes don't mix well"...)

      What you're pointing out is kind of a different problem, i.e. doing sanitization based on picking out an incomplete list of individual characters as opposed to using a catch-all character class.

      >>> tr and Unicode don't mix well
      >> In what way? Seems to work fine for me.
      > By not supporting Unicode-aware character classes,
      > and listing all Unicode characters in a certain category
      > is a usually a moot endeavor.

      > The OP is the best example: it doesn't list all accented
      > characters that could be ASCIIfied.

      The original statement — that tr/// and Unicode don’t mix well — is FUD-raking nonsense. It’s baseless fear, uncertainty, and doubt, and we don’t need it.

      As for character classes, since tr/// never worked on character classes before back in caveman-ASCII, it is a strawman to complain that it doesn’t work on them now.

      Finally, the idea that there exists a such thing as an “accented character”, or that these can be meaningfully “ASCII-fied”, does not hold up.

      • How do you convert a £10-pound note or a 5¢-coin to ASCII?
      • How do you convert Ævar Arnfjörð Bjarmason to ASCII?
      • How do you convert φ ≠ π to ASCII?
      • How do you convert /ɪntɚˈnæʃənəl/ to ASCII?
      • How do you convert ♲ ♳ ♴ ♵ ♶ ♷ ♸ ♹ ♺ ♻ ♼ ♽ to ASCII?
      • How do you convert 👪 💗 🐪 to ASCII?
      • How do you convert my $ʇndʇno = uʍopəpᴉsdn($input) to ASCII?
      • How do you convert Allerød or ψ-ionone or 「文字化け」 to ASCII?
      • How do you convert ♀♂🜫⚩⚥ 🜭⚧🜥🜠⚨⚣🜤🜧🜦🜟⚤🜜⚦🜡⚢🜪 to ASCII?

      More importantly, why in the world do you want to? You can’t put the djinn back in the bottle and go back to a Beaver Cleaver world of a 52-character Latin alphabet that never existed in the first place. Even Gutenberg has 230 sorts, and he was the very first printer for heaven’s sake! If we cannot do at least as well as the very first printer from half a millennium ago, what does that say about us?

      I can only repeat the Bringhurst quote: The fact that such a character set was long considered adequate tells us something about the cultural narrowness of American civilization, or American technocracy, in the midst of twentieth century.

      Guess what? Unlike Beaver Cleaver himself, we are no longer in the midst of the twentieth century, so why should strive to recreate that Neverland that never was?

      I say that we’re better than that, and I’m proud of that fact. To see such obvious Ludditism amongst soi-disant technologists is very troubling. What sort of example are we setting for the future? /small

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://955926]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2024-04-24 23:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found