Regexes and backslashes

oko1 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Regexes and backslashes
by wind (Priest) on Feb 16, 2011 at 01:16 UTC

$_ = "Goodbye\\; Good luck\\, and thanks for all the fish!\\\n\\\n";

print "<$_>\n";

s/\\([;,\n])/$+/g;            # Does work

print "<$_>\n";
[download]

$_ = "Goodbye\\; Good luck\\, and thanks for all the fish!\\n\\n";

print qq{<$_>\n};

s/(\\.)/qq{"$1"}/eeg;

print qq{<$_>\n};
[download]

[reply]
[d/l]
[select]

Re^2: Regexes and backslashes

by oko1 (Deacon) on Feb 16, 2011 at 01:47 UTC

I did indeed mean the string to be as it was; it's actually text that I'm getting from an email that I want to parse. The part where I said "Ugh" was specifically because I'm not a fan of 'eval'ing arbitrary text; that's a a really, really bad idea. :)

There are lots of ways to solve "the problem" (which isn't really a problem; I mean, two substitutions and it's done.) I was just curious to see if, given that original string - one with single backslashes in it - a capture mechanism could be constructed along the lines of what I was trying to do. At the moment, it looks like the answer is 'no' - but I hope that someone here will prove me wrong.

-- 
Education is not the filling of a pail, but the lighting of a fire.
 -- W. B. Yeats

[reply]

Re^3: Regexes and backslashes

by AnomalousMonk (Archbishop) on Feb 16, 2011 at 04:08 UTC

Pay particular attention to the way the \ (backslash) character interpolates into a single-quoted string, and how many of them you need to use to get a \\ double-backslash into the actual string (and printed).

>perl -wMstrict -le
"$_ = 'Goodbye\; Good luck\, and thanks for all the fish!\\\\n\\\\n';
 print;
 s{ \\(.) }{$1}xmsg;
 print;
"
Goodbye\; Good luck\, and thanks for all the fish!\\n\\n
Goodbye; Good luck, and thanks for all the fish!\n\n
[download]

[reply]
[d/l]
[select]

Re: Regexes and backslashes
by ELISHEVA (Prior) on Feb 16, 2011 at 10:05 UTC

If '\\n' (3 chars) is supposed to map to a single newline, what did you want '\n' to map to? '\n' (2 chars) or 'n'?

A hash isn't necessary if you use y///. I also wouldn't worry about the e modifier in this case. You are doing nothing more in the second half of substitution than what you would have to do anyway with an arbitrary string to untaint it: testing if it is defined and eliminating and substituting characters. These examples work with all one character backslash sequences (\t\n\r\f\b\a\e), not just newlines.

For '\n' to '\n' (2 chars):

  s /\\([^\\tnrfbae])|\\\\([tnrfbae])
    /(my$x=defined($1)?$1:$2)=~y#tnrfbae#\t\n\r\f\b\a\e#;$x;
    /gex;
[download]

For '\n' to 'n', this regex would do it:

  s /\\([^\\tnrfbae])|\\\\([tnrfbae])|\\([tnrfbae])
    /(my$x=defined($1)?$1:$2?$2:'')=~y#tnrfbae#\t\n\r\f\b\a\e#;$x?$x:$
+3;
    /gex;
[download]

[reply]
[d/l]
[select]

Re^2: Regexes and backslashes

by oko1 (Deacon) on Feb 17, 2011 at 00:09 UTC

> If '\\n' (3 chars) is supposed to map to a single newline, what did you want '\n' to map to? '\n' (2 chars) or 'n'?

I don't want '\n' remapped at all. The only three things that I do want remapped are \, \; \\n - everything else should just be left as is.

> A hash isn't necessary if you use y///.

But I'm not trying to replace anything like a list of similar metacharacters. The whole challenge here is that I'm dealing with a set of things that are unlike each other: two sets of literal characters and one that is a literal plus a metacharacter. I'm just wondering if they can be handled by a common mechanism; that's what I'm asking about.

Thank you for trying.

-- 
Education is not the filling of a pail, but the lighting of a fire.
 -- W. B. Yeats

[reply]

Re^3: Regexes and backslashes

by 7stud (Deacon) on Feb 17, 2011 at 03:56 UTC

To get a two character literal '\n' to convert to a newline "\n", which is not two characters,I think you have to use eval().

[reply]

Re^4: Regexes and backslashes

by Anonymous Monk on Feb 17, 2011 at 04:05 UTC

Re^3: Regexes and backslashes

by 7stud (Deacon) on Feb 17, 2011 at 03:56 UTC

To get a two character literal '\n' to convert to a newline "\n", which is not two characters, I think you have to use eval().

[reply]

Re: Regexes and backslashes
by 7stud (Deacon) on Feb 16, 2011 at 03:44 UTC

my $text = 'Goodbye\; Good luck\, and thanks for all the fish!\\n\\n';

$text =~ s{ \\ ([^n]) }
                {$1}xmsg;

print "-->$text<--";

--output:--
-->Goodbye; Good luck, and thanks for all the fish!\n\n<--
[download]

[reply]
[d/l]

Re^2: Regexes and backslashes

by oko1 (Deacon) on Feb 16, 2011 at 05:01 UTC

Oh, dear. I guess I'm extra awful at explaining what I mean tonight. Pity; I suppose the only thing left is to shoot myself and get it over with. :)

Your example does indeed result in the above output - but that's not what I'm looking for. I would like for the escaped semicolons to be converted to non-escaped semicolons; for escaped commas to be converted to non-escaped commas; and for escaped newlines to be converted to actual newlines. Metacharacters, not literal '\n's. Ones that actually result in newlines being produced on the screen when a line is printed - i.e.,

Line 1:	Goodbye; Good luck, and thanks for all the fish!
Line 2:

and NOT

Line 1:

Goodbye; Good luck, and thanks for all the fish!\n\n

(I'm not trying to come across as being snarky; I'm just trying to make *sure* that I'm getting across what I mean, since I've obviously failed to do so before now.)

-- 
Education is not the filling of a pail, but the lighting of a fire.
 -- W. B. Yeats

[reply]

Re^3: Regexes and backslashes

by AnomalousMonk (Archbishop) on Feb 16, 2011 at 06:14 UTC

OK, how about this one. It's a little bit klunky in its use of a conversion hash, but the hash can easily be expanded as needed.

>perl -wMstrict -le
"$_ = 'Goodbye\; Good luck\, and thanks for all the fish!\n\n';
 my %conv = ( ',' => ',',  ';' => ';',  n => qq{\n} );
 print qq{[[$_]]};
 s{ \\([,;n]) }{$conv{$1}}xmsg;
 print qq{[[$_]]};;
"
[[Goodbye\; Good luck\, and thanks for all the fish!\n\n]]
[[Goodbye; Good luck, and thanks for all the fish!

]]
[download]

[reply]
[d/l]

Re^4: Regexes and backslashes

by oko1 (Deacon) on Feb 16, 2011 at 08:24 UTC

Re^5: Regexes and backslashes

by wind (Priest) on Feb 16, 2011 at 08:44 UTC

Re^3: Regexes and backslashes

by wind (Priest) on Feb 16, 2011 at 08:34 UTC

Aye, this does make it more clear, although I already understood what you meant in the end.

The problem of course was your extra backslashes in the original string confused your intent.

$_ = 'Goodbye\; Good luck\, and thanks for all the fish!\\n\\n';
[download]

Being synonymous with and more simply stated as

$_ = 'Goodbye\; Good luck\, and thanks for all the fish!\n\n';
[download]

I certainly understand your beef with evaling, but given you know the content of the eval to a certainty, I don't believe there is any security risk. Would be interested if proven wrong there though.

[reply]
[d/l]
[select]