Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Regexes and backslashes

by oko1 (Deacon)
on Feb 16, 2011 at 00:55 UTC ( [id://888430]=perlquestion: print w/replies, xml ) Need Help??

oko1 has asked for the wisdom of the Perl Monks concerning the following question:

Trying to puzzle this one out, and it's enough fun that I wanted to share it with my fellow monks (as well as hoping that someone has an answer. :) A regex _should_ be able to do this, but I'm kinda stuck on how.

#!/usr/bin/perl -w use strict; # Need to 'unescape' the escaped characters $_ = 'Goodbye\; Good luck\, and thanks for all the fish!\\n\\n'; # s/\\([;,\n])/$+/g; # Doesn't match '\\n' # s/\\([;,n])/$1/g; # Matches '\\n', but turns it into 'n' # s#\\n|\\([;,])#$1||$/#ge; # Works, but... UGH. # s/\\n/\n/g; s/\\(;|,)/$1/g; # Yah, yah, shure. print;
So, is there a way to have the '\n' be part of the character class that's being captured? Or did I just run up against an actual limitation, and absolutely have to do it "in two parts", like one of the last two lines?
-- 
Education is not the filling of a pail, but the lighting of a fire.
 -- W. B. Yeats

Replies are listed 'Best First'.
Re: Regexes and backslashes
by wind (Priest) on Feb 16, 2011 at 01:16 UTC
    You're first regex works fine for including a return character in a character class, you simply didn't have any return characters in your string. It helps if you print out your string that you're matching before and after you make changes to it:
    $_ = "Goodbye\\; Good luck\\, and thanks for all the fish!\\\n\\\n"; print "<$_>\n"; s/\\([;,\n])/$+/g; # Does work print "<$_>\n";
    However, it's also possible that you meant the string to be as it was, in which case maybe you're wanting to reeval that string and therefore turn the escaped characters into their corresponding code like the following:
    $_ = "Goodbye\\; Good luck\\, and thanks for all the fish!\\n\\n"; print qq{<$_>\n}; s/(\\.)/qq{"$1"}/eeg; print qq{<$_>\n};

      I did indeed mean the string to be as it was; it's actually text that I'm getting from an email that I want to parse. The part where I said "Ugh" was specifically because I'm not a fan of 'eval'ing arbitrary text; that's a a really, really bad idea. :)

      There are lots of ways to solve "the problem" (which isn't really a problem; I mean, two substitutions and it's done.) I was just curious to see if, given that original string - one with single backslashes in it - a capture mechanism could be constructed along the lines of what I was trying to do. At the moment, it looks like the answer is 'no' - but I hope that someone here will prove me wrong.

      -- 
      Education is not the filling of a pail, but the lighting of a fire.
       -- W. B. Yeats

        Pay particular attention to the way the  \ (backslash) character interpolates into a single-quoted string, and how many of them you need to use to get a  \\ double-backslash into the actual string (and printed).

        >perl -wMstrict -le "$_ = 'Goodbye\; Good luck\, and thanks for all the fish!\\\\n\\\\n'; print; s{ \\(.) }{$1}xmsg; print; " Goodbye\; Good luck\, and thanks for all the fish!\\n\\n Goodbye; Good luck, and thanks for all the fish!\n\n
Re: Regexes and backslashes
by ELISHEVA (Prior) on Feb 16, 2011 at 10:05 UTC

    If '\\n' (3 chars) is supposed to map to a single newline, what did you want '\n' to map to? '\n' (2 chars) or 'n'?

    A hash isn't necessary if you use y///. I also wouldn't worry about the e modifier in this case. You are doing nothing more in the second half of substitution than what you would have to do anyway with an arbitrary string to untaint it: testing if it is defined and eliminating and substituting characters. These examples work with all one character backslash sequences (\t\n\r\f\b\a\e), not just newlines.

    For '\n' to '\n' (2 chars):

    s /\\([^\\tnrfbae])|\\\\([tnrfbae]) /(my$x=defined($1)?$1:$2)=~y#tnrfbae#\t\n\r\f\b\a\e#;$x; /gex;

    For '\n' to 'n', this regex would do it:

    s /\\([^\\tnrfbae])|\\\\([tnrfbae])|\\([tnrfbae]) /(my$x=defined($1)?$1:$2?$2:'')=~y#tnrfbae#\t\n\r\f\b\a\e#;$x?$x:$ +3; /gex;

      > If '\\n' (3 chars) is supposed to map to a single newline, what did you want '\n' to map to? '\n' (2 chars) or 'n'?

      I don't want '\n' remapped at all. The only three things that I do want remapped are \, \; \\n - everything else should just be left as is.

      > A hash isn't necessary if you use y///.

      But I'm not trying to replace anything like a list of similar metacharacters. The whole challenge here is that I'm dealing with a set of things that are unlike each other: two sets of literal characters and one that is a literal plus a metacharacter. I'm just wondering if they can be handled by a common mechanism; that's what I'm asking about.

      Thank you for trying.

      -- 
      Education is not the filling of a pail, but the lighting of a fire.
       -- W. B. Yeats
        To get a two character literal '\n' to convert to a newline "\n", which is not two characters,I think you have to use eval().
        To get a two character literal '\n' to convert to a newline "\n", which is not two characters, I think you have to use eval().
Re: Regexes and backslashes
by 7stud (Deacon) on Feb 16, 2011 at 03:44 UTC
    my $text = 'Goodbye\; Good luck\, and thanks for all the fish!\\n\\n'; $text =~ s{ \\ ([^n]) } {$1}xmsg; print "-->$text<--"; --output:-- -->Goodbye; Good luck, and thanks for all the fish!\n\n<--

      Oh, dear. I guess I'm extra awful at explaining what I mean tonight. Pity; I suppose the only thing left is to shoot myself and get it over with. :)

      Your example does indeed result in the above output - but that's not what I'm looking for. I would like for the escaped semicolons to be converted to non-escaped semicolons; for escaped commas to be converted to non-escaped commas; and for escaped newlines to be converted to actual newlines. Metacharacters, not literal '\n's. Ones that actually result in newlines being produced on the screen when a line is printed - i.e.,

      Line 1:Goodbye; Good luck, and thanks for all the fish!
      Line 2: 

      and NOT

      Line 1:Goodbye; Good luck, and thanks for all the fish!\n\n

      (I'm not trying to come across as being snarky; I'm just trying to make *sure* that I'm getting across what I mean, since I've obviously failed to do so before now.)

      -- 
      Education is not the filling of a pail, but the lighting of a fire.
       -- W. B. Yeats

        OK, how about this one. It's a little bit klunky in its use of a conversion hash, but the hash can easily be expanded as needed.

        >perl -wMstrict -le "$_ = 'Goodbye\; Good luck\, and thanks for all the fish!\n\n'; my %conv = ( ',' => ',', ';' => ';', n => qq{\n} ); print qq{[[$_]]}; s{ \\([,;n]) }{$conv{$1}}xmsg; print qq{[[$_]]};; " [[Goodbye\; Good luck\, and thanks for all the fish!\n\n]] [[Goodbye; Good luck, and thanks for all the fish! ]]

        Aye, this does make it more clear, although I already understood what you meant in the end.

        The problem of course was your extra backslashes in the original string confused your intent.

        $_ = 'Goodbye\; Good luck\, and thanks for all the fish!\\n\\n';

        Being synonymous with and more simply stated as

        $_ = 'Goodbye\; Good luck\, and thanks for all the fish!\n\n';

        I certainly understand your beef with evaling, but given you know the content of the eval to a certainty, I don't believe there is any security risk. Would be interested if proven wrong there though.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://888430]
Approved by graff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-18 04:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found