Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

How can I safely unescape a string.

by Skeeve (Parson)
on Sep 12, 2012 at 20:07 UTC ( [id://993301]=perlquestion: print w/replies, xml ) Need Help??

Skeeve has asked for the wisdom of the Perl Monks concerning the following question:

In another forum someone noticed that certain characters, like Umlauts, on OS X are printed like (e.g.) "\334" and asked how to get back the real charcter ("Ü").

My solution to this was a small perl snippet:

binmode STDOUT => ':utf8'; $_= shift; s/([\@\$%:])/\\$1/g; eval qq(print qq:$_:);

This seems to work fine as for example
deescape '\334ber@n\374ber.com'
correctly prints "Über@nüber.com".

But you might have noted the disadvantage: I'm using an "eval" and so might fall for some code injection. I also needed to escape @, $, % and :. The later because I use it as the quote-character.

Currently I have no idea how I could safely get strings like this unescaped.


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Replies are listed 'Best First'.
Re: How can I safely unescape a string.
by tobyink (Canon) on Sep 12, 2012 at 20:43 UTC
    use 5.010; use utf8::all; my $string = q[\334ber@n\374ber.com]; $string =~ s/\\([0-7]+)/chr oct $1/eg; say $string;
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

      A big ++ from me for presenting the solution I was about to post but with the utf8::all pragma instead of "binmode STDOUT, ':encoding(utf8)'; that my solution would have had. ...because it prompted me to look into this new pragma I hadn't heard about or seen used before. Nice job!

      Makes me wonder what our resident Unicode expert would have to say about it. There must be a list of gotchas.


      Dave

        Hello Dave

        it seems 0x80-0xFF characters with chr() still have to be upgraded even if utf8::all. Below code shows character with chr() in 0x80-0xFF range doesn't have UTF-8 flag.

        use strict; use warnings; my $str='\334ber@n\374ber.com'; $str =~ s/\\([0-7]+)/pack('U', oct('0'.$1))/eg; binmode STDOUT, ":encoding(UTF-8)"; print "$str\n"; print utf8::is_utf8($str) ? "str is ...utf8\n" : "str is ...not utf8\n +"; use 5.010; use utf8::all; my $string='\334ber@n\374ber.com'; $string =~ s/\\([0-7]+)/chr oct $1/eg; print "$string\n"; print utf8::is_utf8($string) ? "string is ...utf8\n" : "string is ...n +ot utf8\n"; my $string='\334ber@n\374ber.com'; $string =~ s/\\([0-7]+)/chr oct $1/eg; utf8::upgrade($string); print "$string\n"; print utf8::is_utf8($string) ? "string is ...utf8\n" : "string is ...n +ot utf8\n";
        I think utf8::all is utf8::almost.

Re: How can I safely unescape a string.
by remiah (Hermit) on Sep 13, 2012 at 12:55 UTC

    oct('0'. NNN) converts octal string to decimal number.
    pack('U', decimal code point) makes utf8 decoded character.

    use strict; use warnings; my $str='\334ber@n\374ber.com'; $str =~ s/\\(\d{3})/pack('U', oct('0'.$1))/eg; binmode STDOUT, ":encoding(UTF-8)"; print $str;

    I made table of unicode documents for perl. Please have a look at them, they are not so long documents.

    perlunitut 6 pages Very very short overview for unicode in perl + FAQ.
    The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) 8 pages About Charcter Set, Code Page, Unicode itself. Short History of Unicode.
    perluniintro 12 pages This is the first thing to read (I think).
    Character Encodings in Perl 7 pages all-in-one doc for encoding. Written by German Author.
    perlunicode 20 pages Main document of perl's unicode. Through and precise, or too much for beginner.
    Perl Programming/Unicode UTF-8 15 pages This document explains internal encoding of Perl (N8CS, utf-8) and also describe other problems. When you stumbled with 0x80-0xFF problem, this document explains the reason.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://993301]
Approved by davido
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-03-29 13:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found