Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Parse::RandGen::Regexp

by paulski (Beadle)
on Aug 06, 2005 at 15:38 UTC ( #481499=perlquestion: print w/ replies, xml ) Need Help??
paulski has asked for the wisdom of the Perl Monks concerning the following question:

I found a better module on CPAN that does exactly what I need Parse::RandGen::Regexp.pm.

The problem I'm having now is how to get a regexp into this function. I wrote a stub progam to test.

#!/usr/bin/perl -w use strict; use Parse::RandGen::Regexp; my $regexp = "/^STOR\s[^\n]{100}/smi"; my $r = Parse::RandGen::Regexp->new($regexp); my $string = $r->pick(match=>1, captures=>{}); print("\$string: $string\n");
This throws the following error.
Unrecognized escape \s passed through at ./regexp2.pl line 6. %Error: Parse::RandGen::Regexp has an element that is not a Regexp re +ference (ref="")! at /usr/lib/perl5/site_perl/5.8.6/Parse/RandGen/Reg +exp.pm line 36 Parse::RandGen::Regexp::_newDerived('Parse::RandGen::Regexp=HASH(0 +x9dcdd88)', 'HASH(0x9e5f138)') called at /usr/lib/perl5/site_perl/5.8 +.6/Parse/RandGen/Condition.pm line 81 Parse::RandGen::Condition::new('Parse::RandGen::Regexp', '/^STORs[ +^\x{a}]{100}/smi') called at ./regexp2.pl line 7
Now I need the string in regexp format to pass to the function. I could just put the string in qr//s but in my real program I need to read the regexps from a list so they will come in scalar format.

i.e. How do I convert:

"/^STOR\s[^\n]{100}/smi";
to
qr/^STOR\s[^\n]{100}/smi
I'm not sure how to do this conversion.

Thanks,

Paul

Janitored by Arunbear - replaced pre tags with code tags, to prevent distortion of site layout and allow code extraction.

Comment on Parse::RandGen::Regexp
Select or Download Code
Re: Parse::RandGen::Regexp
by CountZero (Bishop) on Aug 06, 2005 at 16:11 UTC
    As the qr// method runs in interpolative context one can do
    use strict; my $match='test'; my $regex_match=qr/$match/i; my $test_value='This is a Test'; print 'It matches' if $test_value=~m/$regex_match/;

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Parse::RandGen::Regexp
by ikegami (Pope) on Aug 06, 2005 at 16:11 UTC
    How do I convert ...

    You just did. qr/^STOR\s[^\n]{100}/smi will do nicely. Use it as if it was a string. In other words, you don't need to change anything else.

    By the way, please edit your post, changing <pre>...</pre> to <code>...</code>, to save the janitors from doing it for you.

      This won't work. If the string comes in as "/test/smi" and I simply place the string inside the qr, it breaks. e.g. I'd get qr//test/smi//;, which doesn't help me. I can get rid of the close //'s easily enough but getting the /smi outside the regexp is harder.

        You mean you start with /test/smi from some outside source? The only way to use that is to eval it. Of course, that means using eval on something from an outside source, which is a big no-no.

        If the user can only use / as the delimiter, then a simple regexp will process the options:

        use strict; use warnings; # For example, get regexp from command line: $regepx = $ARGV[0]; # Look for and remove end slashes and modifiers. $regexp =~ s{^/} {} or die("Bad input\n"); $regexp =~ s{/([msix]*)$}{} or die("Bad input\n"); my $modifiers_on = $1; my $modifiers_off = join '', grep { index($modifiers_on, $_) < 0 } qw( x i s m ); # Add the modifiers: $regexp = "(?${modifiers_on}-${modifiers_off}:${regexp})"; # Compile the regexp and check for errors: $regexp = eval { qr/$regexp/ } or die("Bad regexp: $@\n"); # Now you can use it: print($str =~ $regexp ? 'match' : 'no match', "\n");

        It's trickier if you want to support substitutions and the (g)lobal modifier. You're better off asking for these as seperate arguments. (The search string, the replace string and the modifiers.)

Re: Parse::RandGen::Regexp
by Tanktalus (Canon) on Aug 06, 2005 at 16:46 UTC

    There are a number of aspects here. First, your initial problem: the \s. Try escaping the \:

    my $regexp = "/^STOR\\s[^\n]{100}/smi";
    However, then you need to still use the qr operator:
    my $r = Parse::RandGen::Regexp->new(qr/$regexp/);
    But even that won't work the way you think it will because the leading and trailing delimiters will be treated literally - so the regular expression will match something that has a slash, then a beginning-of-line zero-width assertion, and has a trailing "/smi", literally. You could do something like:
    my $r = Parse::RandGen::Regexp->new(eval "qr$regexp");
    but that's unsafe if you get your regexp from an unsafe source. Then again, if you're getting regexp's from unsafe sources, I'm not sure how easy it is to strip out unsafe aspects of regular expressions which could execute arbitrary perl code during a match.

    To remove the eval, you would also have to limit your input to not include the leading/trailing delimiters. Nor would the smi flags be allowed (or they're mandatory). However, even then, the input can still control these flags inside a regular expression:

    (?smi)^STOR\s[^\n]{100}
    Note how the \'s aren't escaped here. Because this is data, and not interpreted by perl (until we get to the regular expression handler), we don't need to escape here. Nothing is escapable because nothing is treated as special. Once you've read this in, you can go back to using qr/$regexp/ in your call to the P::RG::R constructor.

    Hope this helps.

      Your first suggestions are close to what I need but I don't want to have to escape characters. What other characters would I have to escape beside '\s'? This gets hard to manage.

      So I still have the problem, how do I convert a string

      (?smi)^STOR\s^\n{100}

      OR

      /^STOR\s^\n{100}/smi

      into a regexp that the P::RG::R wil handle?

        To convert the first string, (?smi)^STOR\s[^\n]{100}, into a compiled regular expression for P::RG::R, just use qr/$string/, assuming $string contained the string as read from the external source (don't forget to chomp it if that's the case!).

        To convert the second string, /^STOR\s[^\n]{100}/smi, again, assuming $string is read from an external source (not perl code - perl DATA is fine), just use eval "qr$string".

        No matter what, if the string is starting out inside perl code, you have to escape the \'s that have special meaning to the regular expression parser but not to perl, such as \s, \w, \S, \W, \d, \D, ... so that they survive to the parser. This is not needed if the string is stored outside of code because then the perl compiler won't see the \'s.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://481499]
Approved by fauria
Front-paged by fauria
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2014-12-26 20:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls