Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Interpretting character combinations as special characters.

by music_man1352000 (Novice)
on Dec 10, 2009 at 12:52 UTC ( #812171=perlquestion: print w/replies, xml ) Need Help??
music_man1352000 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all. Here is a tricky little problem for you!

I am writing a script that reads in some textual content in which special characters (newline, tab etc.) have been converted to their string representations. For example, all newline characters in the input have been converted to "\n" (the string literal; literally '\' then 'n'). The input also contains escape sequences such as "\(" and "\)" (parentheses were meta characters in the context where the input content was produced). My script needs to convert the special character sequences back into the original special characters (ie. "\n" needs to be interpretted as the newline character again) and un-escape the meta characters as well. My question is how do I do this?

I have tried a couple of approaches:
my $char = 'n'; $char = sprintf("\\%s", $char);
my $char = 'n'; $char =~ s/(n)/\\$1/;
I would prefer the solution to be general so I don't have to do a regex for each special character (newline, tab etc.) and meta character.

Thanks in advance...

Replies are listed 'Best First'.
Re: Interpretting character combinations as special characters.
by Corion (Pope) on Dec 10, 2009 at 13:00 UTC
    my $string = 'This is\na special\tstring'; my %replace = map { $_ => eval "\\$_" } (qw( n r t )); $string =~ s/\\(.)/exists $replace{$1} ? $replace{$1} : $1/ge; print $string;

      Three string evals and a map just for lazyness? Why don't you just write my %replace=( n => "\n", t => "\t", r => "\r" );?


      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        Exactly for that reason - I'm lazy, as I want to be able to add 0 without much hassle, and possibly other special characters.

Re: Interpretting character combinations as special characters.
by moritz (Cardinal) on Dec 10, 2009 at 13:24 UTC
    my %mapping = ( '\t' => "\t", '\n' => "\n", '\)' => ')'. '\( => '(', ); my $re = join '|', map quotemeta, keys %mapping; $str =~ s/($re)/$mapping{$1}/g;

    This has the advantage of not relying on perl's escaping rules (and thus gives you fine control over what's happening), but has the disadvantage that you have to list all replacements.

    Perl 6 - links to (nearly) everything that is Perl 6.
      Listing only those mappings that are different from taking off the leading backslash:
      my %mapping = (n => "\n", t => "\t"); $str =~ s{\\(.)}{$mapping{$1}//$1}seg;
      This still gives fine control, but you don't have to list everything.
Re: Interpretting character combinations as special characters.
by vitoco (Friar) on Dec 10, 2009 at 13:18 UTC
    $text = 'h\te\nll\t\(o\)\n'; print "$text\n"; $text = eval '"'.$text.'"'; print "$text\n";

    If not already escaped, you probably have to escape backslashes, double quotes, dollar signs, and some other special chars... Don't know what is worst!

      @vitoco: I know what you mean! However your suggestion should actually work perfectly for me because I am reading the input character by character (which means I can do eval's only where I find \'s and not have to escape $, @ % etc.).

      @all: thanks once again for all your help!

      String eval for arbitary strings? A very optimistic approach. Escaping all those characters that can make perl execute malicious code is surely more work than the other solutions. And if the OP did not understand or ignored the text below your posting, you just made him insert a gapping security hole into his code.

      So: --


      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        You are right... I should have begun with a big warning about how evil could eval be.

        Fortunately, OP is processing one char at a time, and I couldn't think of a one-byte malicious code. ;-)

        BTW, the first solution I though was this:

        $text = 'h\te\nl\l\t\(o\)\n'; print "$text\n"; %token = ('\n'=>"\n", '\t'=>"\t" , '\('=>'(', '\)'=>')'); $text =~ s/(\\.)/$token{$1}||$1/ges; print "$text\n";

        That will preserve unknown escape codes from text...

        Also, thanks to JavaFan, // operator was new to me...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://812171]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2018-05-26 09:07 GMT
Find Nodes?
    Voting Booth?