Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

How can I safely unescape a string.

by Skeeve (Vicar)
on Sep 12, 2012 at 20:07 UTC ( #993301=perlquestion: print w/ replies, xml ) Need Help??
Skeeve has asked for the wisdom of the Perl Monks concerning the following question:

In another forum someone noticed that certain characters, like Umlauts, on OS X are printed like (e.g.) "\334" and asked how to get back the real charcter ("Ü").

My solution to this was a small perl snippet:

binmode STDOUT => ':utf8'; $_= shift; s/([\@\$%:])/\\$1/g; eval qq(print qq:$_:);

This seems to work fine as for example
deescape '\334ber@n\374ber.com'
correctly prints "Über@nüber.com".

But you might have noted the disadvantage: I'm using an "eval" and so might fall for some code injection. I also needed to escape @, $, % and :. The later because I use it as the quote-character.

Currently I have no idea how I could safely get strings like this unescaped.


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Comment on How can I safely unescape a string.
Select or Download Code
Replies are listed 'Best First'.
Re: How can I safely unescape a string.
by tobyink (Abbot) on Sep 12, 2012 at 20:43 UTC
    use 5.010; use utf8::all; my $string = q[\334ber@n\374ber.com]; $string =~ s/\\([0-7]+)/chr oct $1/eg; say $string;
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

      A big ++ from me for presenting the solution I was about to post but with the utf8::all pragma instead of "binmode STDOUT, ':encoding(utf8)'; that my solution would have had. ...because it prompted me to look into this new pragma I hadn't heard about or seen used before. Nice job!

      Makes me wonder what our resident Unicode expert would have to say about it. There must be a list of gotchas.


      Dave

        Hello Dave

        it seems 0x80-0xFF characters with chr() still have to be upgraded even if utf8::all. Below code shows character with chr() in 0x80-0xFF range doesn't have UTF-8 flag.

        use strict; use warnings; my $str='\334ber@n\374ber.com'; $str =~ s/\\([0-7]+)/pack('U', oct('0'.$1))/eg; binmode STDOUT, ":encoding(UTF-8)"; print "$str\n"; print utf8::is_utf8($str) ? "str is ...utf8\n" : "str is ...not utf8\n +"; use 5.010; use utf8::all; my $string='\334ber@n\374ber.com'; $string =~ s/\\([0-7]+)/chr oct $1/eg; print "$string\n"; print utf8::is_utf8($string) ? "string is ...utf8\n" : "string is ...n +ot utf8\n"; my $string='\334ber@n\374ber.com'; $string =~ s/\\([0-7]+)/chr oct $1/eg; utf8::upgrade($string); print "$string\n"; print utf8::is_utf8($string) ? "string is ...utf8\n" : "string is ...n +ot utf8\n";
        I think utf8::all is utf8::almost.

Re: How can I safely unescape a string.
by remiah (Hermit) on Sep 13, 2012 at 12:55 UTC

    oct('0'. NNN) converts octal string to decimal number.
    pack('U', decimal code point) makes utf8 decoded character.

    use strict; use warnings; my $str='\334ber@n\374ber.com'; $str =~ s/\\(\d{3})/pack('U', oct('0'.$1))/eg; binmode STDOUT, ":encoding(UTF-8)"; print $str;

    I made table of unicode documents for perl. Please have a look at them, they are not so long documents.

    perlunitut 6 pages Very very short overview for unicode in perl + FAQ.
    The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) 8 pages About Charcter Set, Code Page, Unicode itself. Short History of Unicode.
    perluniintro 12 pages This is the first thing to read (I think).
    Character Encodings in Perl 7 pages all-in-one doc for encoding. Written by German Author.
    perlunicode 20 pages Main document of perl's unicode. Through and precise, or too much for beginner.
    Perl Programming/Unicode UTF-8 15 pages This document explains internal encoding of Perl (N8CS, utf-8) and also describe other problems. When you stumbled with 0x80-0xFF problem, this document explains the reason.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://993301]
Approved by davido
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (9)
As of 2015-07-29 10:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (263 votes), past polls