Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

pattern matching words in any order

by daveatmcafee (Initiate)
on Oct 23, 2009 at 16:25 UTC ( #802936=perlquestion: print w/ replies, xml ) Need Help??
daveatmcafee has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I want to match two words in a filename but in any order. I am using the code: if (($zipFile =~ /this/i) && ($zipFile =~ /that/)) { # do something } But I suspect there is a better way than using the &&. Any ideas? Thanks, Dave

Comment on pattern matching words in any order
Re: pattern matching words in any order
by CountZero (Bishop) on Oct 23, 2009 at 16:29 UTC
    Try this:
    $zipFile =~ /(this|that).*?(this|that)/i

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      False positive:

      #!/usr/bin/perl use warnings; use strict; # 802936 my $zipFile="foo bar this this blivitz"; if ( $zipFile =~ /(this|that).*?(this|that)/i ) { print "$1 $2"; }else{ print "nope"; }
Re: pattern matching words in any order
by ikegami (Pope) on Oct 23, 2009 at 16:34 UTC
    /this/ && /that/
    is probably the best. The alternative would be
    /this.*that|that.*this/s

    Using Perl features, you can also do

    /^(?=.*this)(?=.*that)/s /^(?=.*this).*that/s

    Keep in mind that these aren't all equivalent. Consider the string "thathis"

Re: pattern matching words in any order
by BrowserUk (Pope) on Oct 23, 2009 at 16:34 UTC

    This achieves the AND requirement, but note the possibility of false matches if there is overlap between the two words (3rd example):

    [0] Perl> $re = qr[(?=^.*this)(?=^.*that)];; [0] Perl> $_ =~ $re and print "$_ matched" for qw[thisthat thatthis thathis thisnthat thatnthis];; thisthat matched thatthis matched thathis matched ** thisnthat matched thatnthis matched

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Actually /this/ && /that/ would return true in 3rd case, too, so OP should be happy with your (and ikegami's) solution.
Re: pattern matching words in any order
by SuicideJunkie (Priest) on Oct 23, 2009 at 19:02 UTC

    For a list of things to match in any order, what I did was:

    my $regexLockParams = '(?:[\\s,]+' ."|(?:$regexSubstringOf{angle}|$regexSubstringOf{offset})\\s*$rege +xNumber" ."|$regexSubstringOf{dispersion}\\s*$regexNumber" #."|..." .')+';
    The way that works is that it matches (?:A|B|C)+, any of the three submatches can match, and then goes back to try for more.
    In this case, A is whitespace, and not captured. B matches either "angle" or "offset", followed by a real number which it captures into $1. C matches "dispersion" followed by a real number which it captures into $2.

    At the end of the match, you are left with $1 containing the angle, or undef if it was unspecified, and $2 containing the dispersion or again undef.
    Order doesn't matter since the alternation is repeated.

    IE:
    angle 5, dispersion20 ==> $1=5  $2=20
    disp1ang4 ==> $1=4  $2=1
    d42 ==> $1=undef  $2=42

    CAVEAT:
    Each alternation branch must fail to match before the capture is closed, otherwise that capture variable will be overwritten and not get restored to the previous value when the engine backtracks past the capture.
    You can use a lookahead (?=...)to ensure that the remainder of the branch will match if necessary. (In the above example, the capture was last, so there was no postfix to worry about)

    sub safeCapture { # Workaround for Regex issue in which backtracked captures inside +alternations inside repetition, will stomp on the capture value. my $prefix = shift; my $cap = shift; my $postfix = shift; return "$prefix($cap(?=$postfix))$postfix"; }

Re: pattern matching words in any order
by johngg (Abbot) on Oct 23, 2009 at 20:17 UTC

    Using match time pattern interpolation can avoid the && and it also avoids false positives with overlapping words. I'm not sure you could call it better though. What you have is much easier to understand and maintain.

    use strict; use warnings; use re q{eval}; my %alt = ( this => q{that}, that => q{this} ); my $re = do { local $" = q{|}; qr{(?x) ( @{ [ keys %alt ] } ) .* (??{ $alt{ $1 } }) }; }; print sprintf( q{%-9s: }, $_ ), m{$re} ? qq{matched\n} : qq{did not match\n} for qw{ thisthat thatthis thathis thisnthat thatnthis };

    The output.

    thisthat : matched thatthis : matched thathis : did not match thisnthat: matched thatnthis: matched

    I hope this is of interest.

    Cheers,

    JohnGG

    Update: I noticed the i flag on one of the OP's patterns but the above didn't work with mixed case. It is fixed in this version.

    Which produces

    thisthat : matched thatthis : matched thathis : did not match thisnthat: matched thatnthis: matched ThISnthAt: matched
Re: pattern matching words in any order
by ikegami (Pope) on Oct 26, 2009 at 13:38 UTC
    Another approach that solves the overlap problem:
    ++$seen{lc($_)} for split /\s+/, $zipFile; if ($seen{this} && $seen{$that}) { .... }
    gives the same results as
    if ($zipFile =~ /\bthis\b/i && $zipFile =~ /\bthat\b/i) { .... }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://802936]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2014-07-25 04:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (167 votes), past polls