grinder has asked for the wisdom of the Perl Monks concerning the following question:
I was playing around with log files and regular expressions, wishing to be able to supply a RE on the command line to operate on a log file. And I noticed something odd.
Perl's regular expressions admit \L, \U and \Q directives. The latter is quite useful: it applies quotemeta to the remainder of the string, or up until a \E is encountered. This comes in handy for matching strings containing brackets, dots and all those pesky metacharacters that tend to abound in log files.
The trouble is, it doesn't work.
I'll use \U as an example, because it's slightly less mind-bending to follow what's going on. But the same thing applies to all three directives (and it's really only \Q that I'm really interested in).
Consider:
print qr/a\Ubc/; # prints (?-xism:aBC)
all is well and good, but what if you want to fetch the pattern from the command line?
perl -le '$patt = shift; print qr/$patt/' 'a\Ubc'
# prints (?-xism:a\Ubc)
perl -le '$patt = shift; print qr/$patt/' 'a\\Ubc'
# prints (?-xism:a\\Ubc)
I.e., I tried doubling up the backslashes just in case the shell was giving me grief, but that's not the case. And regardless of that, I don't particularly care what it looks like, the main issue is that it doesn't match what it should:
my $patt = shift; # e.g. 'a\Ubc' from the shell
$patt = qr/$patt/;
my $target = 'aBC';
print $target =~ /$patt/; # prints nothing
Now this doesn't match aBC. It doesn't match 'a\Ubc' literally, either for that matter. In fact, I don't know what, if anything, it does match.
I have figured out one way to make it work: put the qr// expression inside a string eval and all is well:
my $patt = shift;
$patt = eval "qr/$patt/"; # eeeww
# patt is now (?-xism:aBC) if given 'a\Ubc'
my $target = 'aBC';
print $target =~ /$patt/; # prints 1
Now all is fine, but the cure is worse than the disease. Any person reading the code will quickly spot that they could have a lot of fun by specifying a pattern such as /.`rm -rf /`./ and then you are in a world of pain.
At this point, the only way out of this conundrum that I can see is to either hand parse the pattern (erk) or use a Safe compartment (re-erk).
I think, however, that my thinking is stuck in some sort of conceptual rut. I can't be the first person to stumble across this behaviour and there must be something really obvious I'm missing. In which case, upside smacks to the head would be most appreciated.
- another intruder with the mooring in the heart of the Perl
Re: qr/string/ is not the same as qr/$var/ ?
by Anonymous Monk on Apr 19, 2005 at 06:51 UTC
|
\L, \U, \Q, \l and \u are dealt with during parsing of a quoted construct, which takes up to five passes:
- Find the end of the quoted construct.
- Removal of backslashes before delimiters.
- Interpolation (this is where \L and friends processing happens)
- Interpolation of regular expressions.
- Optimization of regular expressions.
For details, see the section Gory details of parsing quoted constructs in the perlop manual page.
So, by the time the regex engine gets the string, any \L it finds it treats is as an escaped L - which will trigger a warning.
Strings gotten from the environment, like program parameters, are never parsed as quoted constructs (unless you 'eval' them). | [reply] |
Re: qr/string/ is not the same as qr/$var/ ?
by blazar (Canon) on Apr 19, 2005 at 04:15 UTC
|
I'll use \U as an example, because it's slightly less mind-bending to follow what's going on. But the same thing applies to all three directives (and it's really only \Q that I'm really interested in).
I think \U has to do with string interpolation rather than with actual regexen. In fact perlre says:
Because patterns are processed as double quoted strings, the following also work:
Now all is fine, but the cure is worse than the disease. Any person reading the code will quickly spot that they could have a lot of fun by specifying a pattern such as /.`rm -rf /`./ and then you are in a world of pain.
At this point, the only way out of this conundrum that I can see is to either hand parse the pattern (erk) or use a Safe compartment (re-erk).
I'm not sure how relevant this may be to your security concernes, but it is often said that allowing arbitrary regexen to be passed in is risky in any case. And it also known that Safe.pm itself has holes that experienced hackers (certainly not me, that is!) can exploit, due to it being more of an afterthought than something designed into the language ab initio.
Just my two eurocents.
| [reply] |
Re: qr/string/ is not the same as qr/$var/ ?
by tlm (Prior) on Apr 19, 2005 at 07:54 UTC
|
I have no solution to the eval problem, but FWIW, MO=Deparse,-p clarifies the issue:
% perl -MO=Deparse,-p -le '$patt = "a\Ubc"; print qr/$patt/'
BEGIN { $/ = "\n"; $\ = "\n"; }
($patt = 'aBC');
print(qr/$patt/);
-e syntax OK
Likewise for \Q...\E:
% perl -MO=Deparse,-p -le '$patt = "a\Q[bc]\E"; print qr/$patt/'
BEGIN { $/ = "\n"; $\ = "\n"; }
($patt = 'a\\[bc\\]');
print(qr/$patt/);
-e syntax OK
In other words, by the time that qr sees it, perl has already transformed the original string, which explains both the problem and the eval workaround.
Note that this behavior occurs only if the RHS of the first assignment is double-quoted; with single-quotes you end up with the same problem as if the pattern had been passed in in @ARGV:
% perl -le '$patt = q(a\Ubc); print qr/$patt/'
(?-xism:a\Ubc)
Update: Deleted a bit of stray text that had snuck into one of the original code snippets; added the output for the \Q...\E case.
| [reply] [d/l] [select] |
Re: qr/string/ is not the same as qr/$var/ ?
by Roy Johnson (Monsignor) on Apr 19, 2005 at 08:15 UTC
|
Seems like you could eval it safely without compromising too much functionality if you chose some unlikely delimiters and eliminated them from the input:
use strict;
use warnings;
my $scary_regex = shift;
$scary_regex =~ tr/\cA//d;
# There are now no control-As in the string,
# so I can safely use them as delimiters
my $safe_pat = eval "qq\cA$scary_regex\cA";
my $safe_reg = qr/$safe_pat/;
print "Safe pat is $safe_pat; reg is $safe_reg\n";
Is there danger here that I don't see?
Update: Pustular Postulant pointed out that you could go straight to the regex, rather than having the intermediate $safe_pat string (just use qr instead of qq). When I was putting it together, something told me that wasn't safe, but I think it is.
Also note that \cA on the input works fine, if you actually want control-A in your pattern.
Caution: Contents may have been coded under pressure.
| [reply] [d/l] [select] |
|
Eval, even just for double-quote(ish) interpolation, is still not safe, because it can interpolate ${BLOCK} type expressions, and BLOCK can contain any arbitrary code. Try calling the above script with ${print "I coulda killed ya"} as an argument.
[I had a recommendation for plugging the hole, but it was wrong!]
I am not aware of any other holes, but that doesn't mean there can't be any.
Caution: Contents may have been coded under pressure.
| [reply] [d/l] [select] |
|
% perl -e '/$ARGV[0]/' '(?{print "nasty\n"})'
Eval-group not allowed at runtime, use re 'eval' in regex m/(?{print "
+nasty\n"})/ at -e line 1.
% perl -e 'eval qq(/$ARGV[0]/)' '(?{print "nasty\n"})'
nasty
| [reply] [d/l] |
|
Re: qr/string/ is not the same as qr/$var/ ?
by ambs (Pilgrim) on Apr 19, 2005 at 04:16 UTC
|
This is not regular expressions problem, nor qr problem.
If you try foo.pl as:
my $foo = shift;
print $foo,"\n";
and do
[ambs@eremita tmp]$ perl foo.pl 'fo\o'
fo\o
I know I didn't answer your question. Basically, it seems program arguments are quoted. Can't say why.
| [reply] [d/l] [select] |
Re: qr/string/ is not the same as qr/$var/ ?
by inman (Curate) on Apr 19, 2005 at 05:32 UTC
|
use strict;
use warnings;
my $pattern = shift;
my $target = 'aBC';
my $regex;
eval "\$regex = qr ($pattern)";
print $regex, "\n";
print $target =~ $regex ? "$pattern Worked!\n" : "$pattern Failed!\n";
The eval line interpolates the pattern before it is turned into a regex by qr.
Also consider $pattern =~ s/(.*)/qq(qq($1))/ee; as a way of doing the interpolation without using a direct eval. This method can still be poisoned using a carefully crafted regex. | [reply] [d/l] [select] |
|
What about a pattern like 'a\Ubc) ; `rm -rf /` ; ('? Eval is always dangerous in untrusted data, as the original post claims.
| [reply] [d/l] |
Re: qr/string/ is not the same as qr/$var/ ?
by Anonymous Monk on Apr 19, 2005 at 06:36 UTC
|
Now all is fine, but the cure is worse than the disease. Any person reading the code will quickly spot that they could have a lot of fun by specifying a pattern such as /.`rm -rf /`./ and then you are in a world of pain.
Only if your code is SUID (or SGID). Otherwise, if they want to remove all files they can, they just type rm -rf from the prompt to get the same effect.
| [reply] |
|
First, that presupposes that they have a shell account on the machine. They may not. Second, that would allow them to remove all files that /they/ have access to delete. The OP's program might be (e.g.) a CGI that is not S(U|G)ID but is not run as that user, either.
| [reply] |
Re: qr/string/ is not the same as qr/$var/ ?
by Anonymous Monk on Apr 19, 2005 at 07:06 UTC
|
At this point, the only way out of this conundrum that I can see is to either hand parse the pattern (erk) or use a Safe compartment (re-erk).
Well, for starters, the user doesn't really benefit from \U or \L - it only requires the user to type more characters. For \Q, one could consider adding a flag to the program - if the flag is given, the string should be searched for as is (that is, as if \Q was prepended at the start of the search string), or else, as a regex (kind of like the -F option for (GNU) grep). Else, you can always use something like:
s{\G([^\\]*(?:\\[^Q][^\\]*)*)
\\Q
([^\\]*(?:\\[^E][^\\]*)*)
(?:\\E|$)}
{$1\Q$2}gx;
| [reply] [d/l] |
Re: qr/string/ is not the same as qr/$var/ ?
by duelafn (Parson) on Apr 19, 2005 at 10:14 UTC
|
Hmm, my perl seems to issue a warning:
[dean:~]$ perl -v
This is perl, v5.8.1-RC3 built for darwin-thread-multi-2level
(with 1 registered patch, see perl -V for more detail)
Copyright 1987-2003, Larry Wall
...
[dean:~]$ cat /tmp/deleteme.pl
#!/usr/bin/perl
use strict;
use warnings;
my $patt = shift;
$patt = qr/$patt/;
my $target = shift || 'aBC';
print "'$target' =~ $patt\n";
print $target =~ $patt;
print $/;
[dean:~]$ /tmp/deleteme.pl 'a\Ubc'
Unrecognized escape \U passed through in regex; marked by <-- HERE in
+m/a\U <-- HERE bc/ at /tmp/deleteme.pl line 6.
'aBC' =~ (?-xism:a\Ubc)
I also get the same warning on my home machine (v5.8.4 built for i386-linux-thread-multi)
Good Day,
Dean
Update: If you're only interested in \Q, a more sane hack than eval might be to call quotemeta yourself, something like this, but with quotemeta:
#!/usr/bin/perl
use strict;
use warnings;
no warnings "uninitialized";
my $patt = 'a\Ubc';
$patt =~ s/(^|[^\\])(\\\\)*\\U(.*?)(?:\\E|$)/$1.$2.uc($3)/se;
$patt = qr/$patt/;
my $target = 'aBC';
print "'$target' =~ $patt\n";
print $target =~ $patt;
print $/;
Update 2: Oops, I missed your "erk" about hand-parsing the pattern. Sorry to suggest it. | [reply] [d/l] [select] |
|
|