Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Generating Regular Expressions

by cjf-II (Monk)
on Nov 22, 2002 at 01:36 UTC ( [id://214976]=perlquestion: print w/replies, xml ) Need Help??

cjf-II has asked for the wisdom of the Perl Monks concerning the following question:

I recently started working on a CGI program that acted as a gateway between a few Python clients and a large number of XML files on a server. When I started writing the parameter checking code I found myself writing virtually the same code for each subroutine. Having a single separate sub that was flexible enough to check all parameters would keep the code far simpler, shorter, and more maintainable.

What I quickly threw together was a sub that takes a list of acceptable symbols and two numbers indicating whether or not alpha and numeric characters are allowed in the given parameter. It then generates a very simple regex based on the input.

This seems like it would be a fairly common task (there's even a brief mention of it in Programming Perl 3rd ed.) but a quick search here didn't turn up much other than generating regexes? which wasn't quite what I was looking for. Does anyone routinely generate regexes, and if so what types of tasks have you found generating them most useful for. Also, are there any other examples and/or modules available that I should take a look at? Thanks.

Here's the code (any suggestions are greatly appreciated):

use diagnostics; use strict; use warnings; my @randomParameters = ('Abc.123', '../../', 'a@b.c', '789'); my @allowedSymbols = ('.', '@'); my (@goodParams, @badParams); checkParam($_, 1, 1, @allowedSymbols) for (@randomParameters); print $_, " is good\n" for @goodParams; print $_, " is bad\n" for @badParams; sub checkParam { my $param = shift; my $alpha = shift; # if true allow alpha my $numeric = shift; # if true allow numeric my @symbols = @_; # allow all symbols in @_ my $regex = ""; $regex .= '([a-zA-Z' if $alpha; $regex .= '0-9' if $numeric; for my $symbol (@symbols) { $regex .= $symbol; } $regex .= ']*)'; my $pattern = qr/^${regex}$/; if ($param =~ /$pattern/) { push(@goodParams, $param); } else { push(@badParams, $param); } }

Replies are listed 'Best First'.
Re: Generating Regular Expressions
by Abigail-II (Bishop) on Nov 22, 2002 at 10:07 UTC
Re: Generating Regular Expressions
by DamnDirtyApe (Curate) on Nov 22, 2002 at 03:50 UTC

    Not bad, but there's a couple things I'd do slightly different, in the interest of scalability. Also, assigning to arrays outside the sub from inside the sub couples your routine too tightly to this particular program. Here's my version:

    use diagnostics; use strict; use warnings; use Data::Dumper ; my @randomParameters = ('Abc.123', '../../', 'a@b.c', '789'); my @allowedSymbols = ('.', '@'); my (@goodParams, @badParams); my @results = checkParam( \@randomParameters, ALPHA => 1, NUMERIC => 1, SYMBOLS => \@allowedSymbols ) ; print Dumper \@results ; sub checkParam { my ( $param_list, %options ) = @_ ; my %chartypes = ( 'ALPHA' => '[:alpha:]', 'NUMERIC' => '[:digit:]' ) ; my $regex = '[' ; ## include the character classes. $regex .= join '' => map { $chartypes{$_} } grep { $options{$_} } keys %chartypes ; ## include the symbols. $regex .= join '' => @{$options{'SYMBOLS'}} ; $regex .= ']+' ; my $pattern = qr/^$regex$/; my @results = ([], []) ; for ( @$param_list ) { if ( $_ =~ /$pattern/ ) { push @{$results[0]}, $_ ; } else { push @{$results[1]}, $_ ; } } return @results ; }

    _______________
    DamnDirtyApe
    Those who know that they are profound strive for clarity. Those who
    would like to seem profound to the crowd strive for obscurity.
                --Friedrich Nietzsche

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://214976]
Approved by particle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-20 01:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found