Heidegger has asked for the wisdom of the Perl Monks concerning the following question:
Hello, Monks,
I have a number of string values with which I want to compare my data being tested:
if($data =~ /val1|val2|val3/)
{
doSomething();
}
The number of values is quite big and the regular expression wouldn't be nice if I'd continue developing it that way. I'd liked to populate the regexp from an array dynamically. However, I can't think of an idea of how to make it nicely.
So the question is, how to improve such a code? This problem could be generalized as follows: how to check if the data is in a given set defined by array?
Thank you very much.
Re: Dynamic regexp from array values
by broquaint (Abbot) on Feb 11, 2003 at 17:59 UTC
|
You could either build a regex
my $regex = join '|', map "\Q$_\E", @vals;
doSomething() if $data =~ $regex;
Or better yet use Regex::PreSuf
use Regex::PreSuf;
doSomething() if $data =~ presuf(@vals);
Or put grep() to use
doSomething() if grep $data =~ /\Q$_/, @vals;
HTH
_________ broquaint | [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
|
Actually my $regex = qr(join '|', map "\Q$_\E", @vals) will optimize the pattern matching
Hrm, that's debatable, check out diotalevi's reply in meaning of /o in regexes.
HTH
_________ broquaint
| [reply] [d/l] |
|
Actually my $regex = qr(join '|', map "\Q$_\E", @vals) will optimize the pattern matching
Unfortunately I'll be a totally different pattern though. Dispite the looks of it, qr() is not a function call. It's a quote operator with () as delimiters. You need to either interpolate the join() call (with e.g. @{[join ...]}), or just do $regex = qr/$regex/.
Update: Had misplaced the parenthesis.
ihb
| [reply] [d/l] [select] |
Re: Dynamic regexp from array values
by jdporter (Chancellor) on Feb 11, 2003 at 18:01 UTC
|
Regexes are just strings (well, at least you can think of them that way), which means you can build
them up just as you would any other strings.
my $regex = join '|', @all_the_different_values;
if ( $data =~ /$regex/ ) . . .
If you need only exact matches, then you should probably write it as
if ( $data =~ /^($regex)$/ ) . . .
On the other hand, unless you're using the power of regexes to do something like "wildcard" matching,
there's another way to do it you should consider: Make a "set" of the values, and test for the
existence of $data in that set.
my %set;
@set{ @all_the_different_values } = (); # make the set
if ( exists $set{ $data } ) . . .
jdporter The 6th Rule of Perl Club is -- There is no Rule #6. | [reply] [d/l] [select] |
|
jdporter makes a very good point above. I just want to make it explicit because it's a fairly common over-generalization: If you're only testing for string equality, don't use regular expressions at all. I can't tell from your note whether that's the case or not, but if so, a regular expression is much more work than necessary.
In that case, using a hash as shown above will work, and will be faster, though it may consume more memory. Another alternative is to use a simple loop:
foreach (@target_strings)
{
if ($data eq $_) { doSomething(); last; }
}
(I'm guessing that you can short-circuit the loop if any one of the targets matches.) | [reply] [d/l] |
|
my $re = join '|', map{quotemeta}@vals;
$re = qr/$re/;
if ( $var =~ m/$re/ ) { blah }
If you fail to quotemeta then your regex will either die or commit weirdness if there are any / ( ) { } ^ $ * ? + \ - . chars present in @vals. quotemeta is a vital part of doing this.
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] [d/l] |
Re: Dynamic regexp from array values
by Zaxo (Archbishop) on Feb 11, 2003 at 18:11 UTC
|
If these are fixed strings, as your example suggests, I'd check existance in a hash constructed with the strings as keys. That will be much faster than sifting through a collection of regexen.
my %match;
@match{@strings) = ();
# to test a var, $check_this
if (exists $match{$check_this}) {
# ...
}
Update: This doesn't act like a regex, of course. The presence of a substring in $check_this will not test true. You can adapt by parsing how you expect the strings to appear, or else by iterating the array of @strings over substr instead of a regex - no hash involved there.
After Compline, Zaxo | [reply] [d/l] |
Re: Dynamic regexp from array values
by DaveH (Monk) on Feb 11, 2003 at 21:07 UTC
|
Hi.
A similar discussion was had a few months ago here: Set Operators. Everyone seemed to come up with different answers, but like so much other stuff with Perl, "it depends" was the best answer.
If you can afford the memory, and your list of values to test against is reasonably small, building a hash slice is a nice "Perlish" solution. It can also pay long-term if you are going to be doing these searches over and over again.
On the other hand, if you only have a few values to test against, hashes may be more trouble than they're worth - just use a foreach loop. An example of where this would be perfectly appropriate would be parsing commandline arguments manually (although you should generally use some sort of Getopt).
However, since it is implemented directly in C as a core function, you would have to work hard to beat grep in terms of raw speed. At the same time, you are also guaranteed that grep will examine every item of the search list, every time. You get no chance to `last' out of the loop early if your needs have already been matched. Raw speed only scales so far...
Unfortunately, the old adage still stands - know your data. Only you can decide which approach works best; and if in doubt - "use Benchmark;". :-)
Cheers, -- Dave :-)
$q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print
| [reply] |
|
|