Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

efficient char escape sequence substitution

by mifflin (Curate)
on May 27, 2005 at 18:22 UTC ( #461186=perlquestion: print w/replies, xml ) Need Help??

mifflin has asked for the wisdom of the Perl Monks concerning the following question:

I have a string that I have read from a file (delivered by a client). This string may have character escape sequences in them like '\r', '\n' and '\t'. I would like to replace them with the real character. for example:

$var = 'this is a string\twith some text\r\n'; $var =~ s/\\t/\t/; $var =~ s/\\r/\r/; $var =~ s/\\n/\n/;

I know this works, but is there a better way to handle character escape string subsitutions so that if the file started having other escape sequences in it I could do the proper substitutions without adding more code? for example: if the string contained a new escape sequence \x09

$var = 'this is a string\twith some\x09text\r\n';

I would then have to add another substitution to handle it

$var =~ s/\\x09/\x09/;

what I'm looking for is some way to handle any escape sequences that they may throw at me.

Replies are listed 'Best First'.
String::Escape
by marnanel (Beadle) on May 27, 2005 at 18:29 UTC
    String::Escape has a function "unprintable" which does pretty much what you're looking for.

      I've been trying out String::Escape.
      The docs show using double backslash escapes however it appears to work the same with single backslashed r, t and n.
      However, it does not appear to work with hex or octal escapes.

      erickn@cosmora01d:/home/erickn/String-Escape-2002.001/blib/lib> cat x use lib '.'; use String::Escape qw(unprintable); $var = 'this\tis\ta\011string\x09with some text\r\n'; print unprintable($var); $var = 'this\\tis\\ta\\011string\\x09with some text\\r\\n'; print unprintable($var); erickn@cosmora01d:/home/erickn/String-Escape-2002.001/blib/lib> perl x this is a1string\x09with some text this is a1string\x09with some text

      Am I using it incorrectly?

        Try printing the value of $var... it will show something you don't expect... \\ in single quotes still escapes the backslash and results in only one backslash.

        Converting the hex-strings can be done like this: s#(?<!\\)(\\{2})*\\x([A-F0-9a-f]{2})#$1 . chr (hex ($2))#eg;

        It use a look-back to see that there is no backslash before the double backslashes, then it matches two backslashes (a backslash escaping a backslash), and then the backslash and the x, and ofcourse the hex-symbols.

        If you want to have the correct result after that you should replace all double backslashes with a single backslash... or better said, remove the backslash before every symbol... but perhaps unprintable does that too...

        Update: I decided to look at the source of String::Escape, and the code to match the hex charachters seems to be bugged... as in, it does match \AF (basiclly m/\\[A-Fa-f0-9]/), but not with the \xAF before it... It doesn't seem to have code for octal charachters though... but you should be able to work that out with my previous regex (which is why I leave it in this post). (I will send a mail to the author about this...)

        Update2: Reply from the author:

        Looks like you're right. I've gotten a few other suggestions, so I'll try to roll them into a new release soon.

        Update3: I noticed that djohnston regex had a flaw and I explained to him how to fix it. After looking back at the original regex (of the OP), I noticed that it has the same flaw. So here is the explanation. (in readmore tags ofcourse)

        Oh, no: I don't think it does hex or octal escapes. Sorry. It ought to!

        Maybe you could base something that does on the String::Escape code, though.

Re: efficient char escape sequence substitution
by Roy Johnson (Monsignor) on May 27, 2005 at 21:58 UTC
    This is safer than an overall string-eval, and seems to work as desired, replacing only backslash-escapes:
    use strict; use warnings; s/(\\(?:\W|\w{1,3}))/qq("$1")/gee, print for <DATA>; __DATA__ this is a string\t$foowith some text\r\n this is a string\010with some\x09text\r\n

    Caution: Contents may have been coded under pressure.
      That doesn't handle wide-character escapes.

      In comp.lang.perl.misc I recently offered this solution.

      s/(\\[^"\$\@]+)/qq("$1")/eeg;

      Note that's not 100% infalible but AFAIK it's not a security problem and copes with all reasonable strings. It does have the interesting side effect of stripping \$ \" and \@ - but those have no buisness being in most of the sort of strings you'll encounter.

        How wide is a wide character escape? There's no reason that you can't change the quantifier in my regex to accommodate them. I just wasn't aware of any escapes wider than three characters.

        Caution: Contents may have been coded under pressure.
      Can you help me understand your regex modifiers?
      gee
      Where in perldoc can I find an explanation?
        Documentation.

        g means global replace. e means eval the substitution space before substituting it. Repeating the e means eval it again. So it starts out looking like the string qr("\\t"); after an eval, it looks like "\t", and after a 2nd eval, it's just a tab. (Actually, it may have captured more than just the tab, but anything after the tab is left intact.)


        Caution: Contents may have been coded under pressure.
Re: efficient char escape sequence substitution
by djohnston (Monk) on May 27, 2005 at 20:27 UTC
    Funny you should ask, as I just used this code last night. Here's how I did it:
    sub de_slashify { my $string = shift; my $esc = { n => "\n", r => "\r", t => "\t", }; $string =~ s/\\([nrt])/$esc->{ $1 }/ge; return $string }
    I couldn't tell you if it's more efficient, but it's at least an alternative.

    Update:
    (On a tip from Animator)
    I should point out that this very simple code snippet won't handle all escape sequences, only the (initial) three you asked about. If you need to parse anything that might contain other more complex escape sequences you should probably use a module.

Re: efficient char escape sequence substitution
by Roy Johnson (Monsignor) on May 27, 2005 at 18:59 UTC
    eval qq("$var"), with the usual caveats about the dangers of string-eval, as well as the fact that it's going to interpolate variables.

    Caution: Contents may have been coded under pressure.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://461186]
Approved by fauria
Front-paged by fauria
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2023-12-08 18:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?











    Results (37 votes). Check out past polls.

    Notices?