Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

qr// and user provided regex patterns...

by misterMatt (Novice)
on Jul 30, 2009 at 06:20 UTC ( #784500=perlquestion: print w/ replies, xml ) Need Help??
misterMatt has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys. The function of my program is to evaluate user provided patterns on user provided text - and then spit back the results. Of course, I've encountered a snag pretty early on. Right now, I'm trying to get the -c flag working. I'm calling the script like this: regexTester -c /Bill Clinton/i, and I'm trying to evaluate the $text 'bill clinton'. It's failing! could you guys enlighten me as to why? Here is the code (I've omitted the pod stuff to save space).
my %options=(); getopts('c:f:', \%options); pod2usage(2) if (keys %options != 1); if($options{c}){ #variable grab my $pattern = qr/$options{c}/; print "Please enter the text you wish to run the pattern on: "; my $text = <STDIN>; chomp $text; #do work and display if($text =~ $pattern){ print $&; #prints entire match print " " . $text; } else{ print "$pattern on $text Failed. "; } }
Thanks monks. I know the problem probably lies in me not exactly understanding how qr// works.

Comment on qr// and user provided regex patterns...
Download Code
Re: qr// and user provided regex patterns...
by ikegami (Pope) on Jul 30, 2009 at 06:27 UTC

    The function of my program is to evaluate user provided patterns

    /Bill Clinton/i is NOT a regex pattern. It's a match operator. If you want to execute arbitrary Perl code such as the match operator, you need eval EXPR (and all the attendant validation and security problems) or parse the input yourself (which could be done simply enough in this case).

    If you wish to continue use qr//, you'll need to pass an actual regex pattern such as (?i:Bill Clinton)

    By the way, I'd avoid using $& since simply using it can negatively affect distant parts of your program. You could change

    if($text =~ $pattern){ print $&; #prints entire match
    to
    if($text =~ /($pattern)/){ print $1; #prints entire match
    but I'd avoid globals entirely and use
    if(my ($match) = $text =~ /($pattern)/){ print $match; #prints entire match
      Okay, I decided to go the eval route. Because as you pointed out - I'm only using the match operator - which I thought was a regex pattern (I'm really still not clear on the difference.) anyway, I'm also going that route because as another monk pointed out - the use of qr// messes with the user pattern that's already wrapped in //. (the interpolation issue) right now I'm looking at this:
      if($options{c}){ #variable grab my $pattern = eval "qr$options{c}" or die $@; print "Please enter the text you wish to run the pattern on: "; my $text = <STDIN>; chomp $text; if($text =~ $pattern){ print $&; #prints entire match print " " . $text; } else{ print "$pattern on $text Failed. "; } }
      It works, as long as what the user provides doesn't have any spaces in it. (ex: /Bill Clinton/i causes the program to fail, but /BillClinton/i doesn't.) How do I fix that?

        I'm really still not clear on the difference.

        A multiplier tells the multiplication operator by how much it should multiply. The multiplier is data, the operator is Perl code.

        A regex pattern tells the match operator what it should be match. The pattern is data to Perl, the operator is Perl code.

        You don't expect $c = '4 * 5'; 3 + $c to execute the contents of $c, so why would you expect differently from $c = '/foo/'; $x =~ $c?

        It works, as long as what the user provides doesn't have any spaces in it. (ex: /Bill Clinton/i causes the program to fail, but /BillClinton/i doesn't.) How do I fix that?

        I'm guessing the problem is that the user did NOT provide an argument with spaces in it, but rather passed two arguments. Quote the argument appropriately for your shell.

      Using the eval works great for matches : $pattern = eval "qr$pattern" or die $@;
      But of course, when I do a 'substitution pattern' s///, it fails because it looks like eval "qrs/stuff/morestuff/. So I tried adding a bang, to wrap around the interpolated variable in the original : $pattern = eval "qr!$pattern!" or die $@; But now my patterns fail to match anything. Example:
      s/Bill Clinton/The ex-President/gi on Bill Clinton is the ex-President +. bill clinton. results in Failure.

        Using the eval works great for matches : $pattern = eval "qr$pattern" or die $@;

        No, it doesn't.

        First of all, it doesn't work at all if $pattern actually contains a pattern.

        And then there are issues with improper escaping. If you pass /a\+/, it won't match "a" followed by "+" as desired.

      Okay, I've managed to get this to work perfectly with a regular match (//) - but when I try to do a substitution(s///), the pattern fails to match anything - and no substitution is done. I'm calling the program like this: -r "s/matt/matthew/" -i -g, with the text 'matt, Matt'. Here is the code that processes the substitution bit:
      if($options{r}){ #variable grab, add flags to pattern if they exist. my $pattern = $options{r}; $pattern .= 'g' if $options{g}; $pattern .= 'i' if $options{i}; $pattern .= 's' if $options{s}; #compile that stuff with eval my $compd_pattern = eval "qr($pattern)" or die $@; print "Please enter the text you wish to run the pattern on: "; my $text = <STDIN>; chomp $text; #do work and display if($text =~ $compd_pattern){ print $text; } else{ print "$pattern on \n\t{$text} Failed. "; } } #end R FLAG
        I suspect that qr was not intended to be used on an entire subsitution operator like that. I believe it should only be used on the regular expression portion (left side) of the subsitution. See perlop.

        Here is an example script which uses s/// as a command-line option:
        http://www.cpan.org/authors/id/F/FO/FORMAN/ren-regexp-1.5

        As toolic points out, qr// is meant to quote a pattern, not an entire substitution. If you want to pass in literal substitutions, rather than just patterns, then just eval at a different point:
        eval "\$text =~ $pattern";

        By the way, ikegami suggested using (?i:) in place of a string-based eval. Since such evals are always a cause for concern, and should be a cause for terror when run on user-provided input, are you sure that you considered the alternative properly before making your decision?

        If you're not happy with the inability of a (?i:)-type solution to handle substitutions, then you can probably just offer --search and --replace --by flags so that the user can specify directly what he or she wants.

        • It makes no sense to let the user specify /g for a match. You can't even use it on qr// because it makes no sense. ixsm are the four modifiers that apply to the pattern as opposed to the operator.

        • The followign doesn't make much sense:

          my $compd_pattern = eval "qr($pattern)" or die $@;

          It removes the ability of qr// to quote, which is what you want. The code should be

          my $compd_pattern = qr($pattern);
        • You say you have problems doing substitutions, but you didn't show us your attempt (despite your claim). You should have no problems using a qr// pattern in a substitution.

        my $pat = ...; my $repl = ...; my $mods = ''; $mods .= 'i' if ...; $mods .= 's' if ...; $mods .= 'm' if ...; $mods .= 'x' if ...; my $re = qr/(?$mods:$pat)/; if (...) { s/$re/$repl/g; } else { s/$re/$repl/; }
Re: qr// and user provided regex patterns...
by moritz (Cardinal) on Jul 30, 2009 at 06:30 UTC

    It would be helpful if you told us in which way it is failing.

    I'm trying to make a guess: If you interpolate the string /Bill Clinton/ into a regex or use it as a regex, then the slashes are part of the pattern - probably not what you want/need.

Re: qr// and user provided regex patterns...
by Skeeve (Vicar) on Jul 30, 2009 at 06:38 UTC

    The main problem is in     if($text =~ $pattern){

    You need to put $pattern between slashes like /$pattern/

    ++ to ikegami I didn't know that. Maybe because I usually match against $_ and so have seldom the need for =~

    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
      Not true, not even if $pattern contained an ordinary string instead of a compiled regex pattern. On the RHS of =~, Perl inserts a match operator if none of the applicable operators is found.
Re: qr// and user provided regex patterns...
by toolic (Chancellor) on Jul 30, 2009 at 13:09 UTC
    ikegami provided a solution to your direct question.

    An alternate approach that I usually employ is to provide another command-line option for a case-sensitive match.

    my $pattern = ($options{i}) ? qr/$options{c}/i : qr/$options{c}/;

    The following does a case-sensitive match:

    regexTester -c 'Bill Clinton'

    The following does a case-insensitive match:

    regexTester -i -c 'Bill Clinton'

    See Finding commands in Unix PATH for a complete example with POD instructions.

      Is there a graceful way to do this with a any combination of 3 flags?
        What is a any combination of 3 flags?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://784500]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2014-12-18 00:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (41 votes), past polls