http://www.perlmonks.org?node_id=1040885

yoda54 has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I'm writing a search to go through an array of data, I'm looking for any suggestions on algorithms, structuring, or any language advice to make my code better. Also, what is a better way to build my regexes?

Thanks!

search(["bobcat", "boomerang", "beer", "bat"], ["-bo"], ["b"]); sub search { my ($data, $negative_criteria, $positive_criteria = @_; my $temp_array; my $results; my $match_regex = join "|", @$positive_criteria; # remove trailing "-" which denotes negated criteria my $negative_regex = [map { s/^-//, $_ } @$negative_criteria]; $negative_regex = join "|", @$negative_criteria; # filter through all matches to array foreach(@$data) { if ($_ =~ m/$match_regex/) { push @$temp_array, $_; } } # remove negative criteria and build new array foreach(@$temp_array) { if ($_ !~ m/$negative_regex/) { push @$results, $_; } } my $count = 0; foreach(@$results) { print "$count - $_\n"; $count++; } return $results; } Output: 0 - beer 1 - bat

Replies are listed 'Best First'.
Re: General Advice
by hbm (Hermit) on Jun 27, 2013 at 01:31 UTC

    use strict; use warnings; catches one error - a missing paren.

    I'd build $negative_regex in one pass:

    my $negative_regex = join '|', map { s/^-//, $_ } @$negative_criteria;

    And test both regexes in one pass:

    foreach(@$data) { push @$results, $_ if /$match_regex/ && !/$negative_regex/; }

      use strict; use warnings; catches one error - a missing paren.

      No , perl catches syntax errors

      Thanks!
Re: General Advice
by NetWallah (Canon) on Jun 27, 2013 at 04:57 UTC
    One regex to rule them all !
    Uses negative look-ahead assertion.
    use strict; use warnings; search(["bobcat", "boomerang", "beer", "bat"], ["-bo"], ["b"]); sub search { my ($data, $negative_criteria, $positive_criteria )= @_; my $re="^(?!" . join("|", map{ s/^-//;$_ }@$negative_criteria) . ") +" . join "|", @$positive_criteria; $re=qr|$re|; my @results = grep { m/$re/ } @$data; for (0..$#results){ print $_+1 . " - $results[$_]\n"; } return \@results; }

                 "The trouble with the Internet is that it's replacing masturbation as a leisure activity."
            -- Patrick Murray

      Thanks!!!
Re: General Advice
by kcott (Archbishop) on Jun 27, 2013 at 06:51 UTC

    G'day yoda54,

    Try to match shorter patterns before longer ones. While your example only shows single items for your positive and negative criteria, your code suggests multiple items are possible. You haven't indicated the source of the criteria but, if qw{book and lots of other patterns then b} was valid positive criteria, you'd fail to match "book" in any of the $data items only to later successfully match "b" in all of them.

    ... join '|' => sort { length $a <=> length $b } @$xxxxxx_criteria;

    You've shown interpolating double quotes in many places where you don't actually want to interpolate anything. They may not be an issue in your minimal example; however, consider what happens if you're searching for, say, email addresses:

    ... "whoever@example.com" ... # Oops!

    If the items in your criteria lists are mutually exclusive, consider wrapping your alternations in (?> ... ) to avoid backtracking. See perlre - Extended Patterns and perlre - Backtracking.

    -- Ken

      Thank you!
Re: General Advice
by hdb (Monsignor) on Jun 27, 2013 at 07:56 UTC

    I recommend using grep in this case. One thing to observe on everything proposed so far: the use of s/// within sub search has side effects outside of the the sub as it modifies the input! If it is an anonymous array reference it does not make any difference, but otherwise it might create problems later. See the difference between the line commented out below and the following one.

    use strict; use warnings; sub search { my ( $data, $neg, $pos ) = @_; my $regp = join "|", @$pos; # my $regn = join "|", map { s/^-//, $_ } @$neg; # warning: si +de effects outside of sub search! my $regn = join "|", map { /^-?(.*)/ } @$neg; # no side effect +s my @results = grep { /$regp/ && !/$regn/ } @$data; return \@results; } my $neg = ["-bo"]; my $results = search(["bobcat", "boomerang", "beer", "bat", "aaa" ], $ +neg, ["b"]); print "@$results\n"; print "@$neg\n";
      Thanks
Re: General Advice
by clueless newbie (Curate) on Jun 27, 2013 at 11:56 UTC