Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Dynamic regexp from array values

by Heidegger (Hermit)
on Feb 11, 2003 at 17:53 UTC ( #234472=perlquestion: print w/replies, xml ) Need Help??

Heidegger has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Monks,

I have a number of string values with which I want to compare my data being tested:

if($data =~ /val1|val2|val3/) { doSomething(); }

The number of values is quite big and the regular expression wouldn't be nice if I'd continue developing it that way. I'd liked to populate the regexp from an array dynamically. However, I can't think of an idea of how to make it nicely.

So the question is, how to improve such a code? This problem could be generalized as follows: how to check if the data is in a given set defined by array?

Thank you very much.

Replies are listed 'Best First'.
Re: Dynamic regexp from array values
by broquaint (Abbot) on Feb 11, 2003 at 17:59 UTC
    You could either build a regex
    my $regex = join '|', map "\Q$_\E", @vals; doSomething() if $data =~ $regex;
    Or better yet use Regex::PreSuf
    use Regex::PreSuf; doSomething() if $data =~ presuf(@vals);
    Or put grep() to use
    doSomething() if grep $data =~ /\Q$_/, @vals;



      You could either build a regex
      my $regex = join '|', map "\Q$_\E", @vals; doSomething() if $data =~ $regex;

      Actually my $regex = qr(join '|', map "\Q$_\E", @vals) will optimize the pattern matching. See perldoc -f qr


      The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
      --John M. Dlugosz
        Actually my $regex = qr(join '|', map "\Q$_\E", @vals) will optimize the pattern matching
        Hrm, that's debatable, check out diotalevi's reply in meaning of /o in regexes.


        Actually my $regex = qr(join '|', map "\Q$_\E", @vals) will optimize the pattern matching

        Unfortunately I'll be a totally different pattern though. Dispite the looks of it, qr() is not a function call. It's a quote operator with () as delimiters. You need to either interpolate the join() call (with e.g. @{[join ...]}), or just do $regex = qr/$regex/.

        Update: Had misplaced the parenthesis.

Re: Dynamic regexp from array values
by jdporter (Chancellor) on Feb 11, 2003 at 18:01 UTC
    Regexes are just strings (well, at least you can think of them that way), which means you can build them up just as you would any other strings.
    my $regex = join '|', @all_the_different_values; if ( $data =~ /$regex/ ) . . .
    If you need only exact matches, then you should probably write it as
    if ( $data =~ /^($regex)$/ ) . . .
    On the other hand, unless you're using the power of regexes to do something like "wildcard" matching, there's another way to do it you should consider: Make a "set" of the values, and test for the existence of $data in that set.
    my %set; @set{ @all_the_different_values } = (); # make the set if ( exists $set{ $data } ) . . .

    The 6th Rule of Perl Club is -- There is no Rule #6.

      jdporter makes a very good point above. I just want to make it explicit because it's a fairly common over-generalization: If you're only testing for string equality, don't use regular expressions at all. I can't tell from your note whether that's the case or not, but if so, a regular expression is much more work than necessary.

      In that case, using a hash as shown above will work, and will be faster, though it may consume more memory. Another alternative is to use a simple loop:

      foreach (@target_strings) { if ($data eq $_) { doSomething(); last; } }
      (I'm guessing that you can short-circuit the loop if any one of the targets matches.)

      You *must* use quotemeta or \Q \E if you are going to do this and qr// or a /o modifier on the regex is also not a bad idea:

      my $re = join '|', map{quotemeta}@vals; $re = qr/$re/; if ( $var =~ m/$re/ ) { blah }

      If you fail to quotemeta then your regex will either die or commit weirdness if there are any / ( ) { } ^ $ * ? + \ - . chars present in @vals. quotemeta is a vital part of doing this.




Re: Dynamic regexp from array values
by Zaxo (Archbishop) on Feb 11, 2003 at 18:11 UTC

    If these are fixed strings, as your example suggests, I'd check existance in a hash constructed with the strings as keys. That will be much faster than sifting through a collection of regexen.

    my %match; @match{@strings) = (); # to test a var, $check_this if (exists $match{$check_this}) { # ... }

    Update: This doesn't act like a regex, of course. The presence of a substring in $check_this will not test true. You can adapt by parsing how you expect the strings to appear, or else by iterating the array of @strings over substr instead of a regex - no hash involved there.

    After Compline,

Re: Dynamic regexp from array values
by DaveH (Monk) on Feb 11, 2003 at 21:07 UTC


    A similar discussion was had a few months ago here: Set Operators. Everyone seemed to come up with different answers, but like so much other stuff with Perl, "it depends" was the best answer.

    If you can afford the memory, and your list of values to test against is reasonably small, building a hash slice is a nice "Perlish" solution. It can also pay long-term if you are going to be doing these searches over and over again.

    On the other hand, if you only have a few values to test against, hashes may be more trouble than they're worth - just use a foreach loop. An example of where this would be perfectly appropriate would be parsing commandline arguments manually (although you should generally use some sort of Getopt).

    However, since it is implemented directly in C as a core function, you would have to work hard to beat grep in terms of raw speed. At the same time, you are also guaranteed that grep will examine every item of the search list, every time. You get no chance to `last' out of the loop early if your needs have already been matched. Raw speed only scales so far...

    Unfortunately, the old adage still stands - know your data. Only you can decide which approach works best; and if in doubt - "use Benchmark;". :-)


    -- Dave :-)


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://234472]
Approved by broquaint
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2023-06-01 16:58 GMT
Find Nodes?
    Voting Booth?

    No recent polls found