Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

pattern matching words in any order

by daveatmcafee (Initiate)
on Oct 23, 2009 at 16:25 UTC ( #802936=perlquestion: print w/replies, xml ) Need Help??
daveatmcafee has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I want to match two words in a filename but in any order. I am using the code: if (($zipFile =~ /this/i) && ($zipFile =~ /that/)) { # do something } But I suspect there is a better way than using the &&. Any ideas? Thanks, Dave

Replies are listed 'Best First'.
Re: pattern matching words in any order
by ikegami (Pope) on Oct 23, 2009 at 16:34 UTC
    /this/ && /that/
    is probably the best. The alternative would be

    Using Perl features, you can also do

    /^(?=.*this)(?=.*that)/s /^(?=.*this).*that/s

    Keep in mind that these aren't all equivalent. Consider the string "thathis"

Re: pattern matching words in any order
by BrowserUk (Pope) on Oct 23, 2009 at 16:34 UTC

    This achieves the AND requirement, but note the possibility of false matches if there is overlap between the two words (3rd example):

    [0] Perl> $re = qr[(?=^.*this)(?=^.*that)];; [0] Perl> $_ =~ $re and print "$_ matched" for qw[thisthat thatthis thathis thisnthat thatnthis];; thisthat matched thatthis matched thathis matched ** thisnthat matched thatnthis matched

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Actually /this/ && /that/ would return true in 3rd case, too, so OP should be happy with your (and ikegami's) solution.
Re: pattern matching words in any order
by CountZero (Bishop) on Oct 23, 2009 at 16:29 UTC
    Try this:
    $zipFile =~ /(this|that).*?(this|that)/i


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      False positive:

      #!/usr/bin/perl use warnings; use strict; # 802936 my $zipFile="foo bar this this blivitz"; if ( $zipFile =~ /(this|that).*?(this|that)/i ) { print "$1 $2"; }else{ print "nope"; }
Re: pattern matching words in any order
by johngg (Abbot) on Oct 23, 2009 at 20:17 UTC

    Using match time pattern interpolation can avoid the && and it also avoids false positives with overlapping words. I'm not sure you could call it better though. What you have is much easier to understand and maintain.

    use strict; use warnings; use re q{eval}; my %alt = ( this => q{that}, that => q{this} ); my $re = do { local $" = q{|}; qr{(?x) ( @{ [ keys %alt ] } ) .* (??{ $alt{ $1 } }) }; }; print sprintf( q{%-9s: }, $_ ), m{$re} ? qq{matched\n} : qq{did not match\n} for qw{ thisthat thatthis thathis thisnthat thatnthis };

    The output.

    thisthat : matched thatthis : matched thathis : did not match thisnthat: matched thatnthis: matched

    I hope this is of interest.



    Update: I noticed the i flag on one of the OP's patterns but the above didn't work with mixed case. It is fixed in this version.

    Which produces

    thisthat : matched thatthis : matched thathis : did not match thisnthat: matched thatnthis: matched ThISnthAt: matched
Re: pattern matching words in any order
by SuicideJunkie (Vicar) on Oct 23, 2009 at 19:02 UTC

    For a list of things to match in any order, what I did was:

    my $regexLockParams = '(?:[\\s,]+' ."|(?:$regexSubstringOf{angle}|$regexSubstringOf{offset})\\s*$rege +xNumber" ."|$regexSubstringOf{dispersion}\\s*$regexNumber" #."|..." .')+';
    The way that works is that it matches (?:A|B|C)+, any of the three submatches can match, and then goes back to try for more.
    In this case, A is whitespace, and not captured. B matches either "angle" or "offset", followed by a real number which it captures into $1. C matches "dispersion" followed by a real number which it captures into $2.

    At the end of the match, you are left with $1 containing the angle, or undef if it was unspecified, and $2 containing the dispersion or again undef.
    Order doesn't matter since the alternation is repeated.

    angle 5, dispersion20 ==> $1=5  $2=20
    disp1ang4 ==> $1=4  $2=1
    d42 ==> $1=undef  $2=42

    Each alternation branch must fail to match before the capture is closed, otherwise that capture variable will be overwritten and not get restored to the previous value when the engine backtracks past the capture.
    You can use a lookahead (?=...)to ensure that the remainder of the branch will match if necessary. (In the above example, the capture was last, so there was no postfix to worry about)

    sub safeCapture { # Workaround for Regex issue in which backtracked captures inside +alternations inside repetition, will stomp on the capture value. my $prefix = shift; my $cap = shift; my $postfix = shift; return "$prefix($cap(?=$postfix))$postfix"; }

Re: pattern matching words in any order
by ikegami (Pope) on Oct 26, 2009 at 13:38 UTC
    Another approach that solves the overlap problem:
    ++$seen{lc($_)} for split /\s+/, $zipFile; if ($seen{this} && $seen{$that}) { .... }
    gives the same results as
    if ($zipFile =~ /\bthis\b/i && $zipFile =~ /\bthat\b/i) { .... }

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://802936]
Approved by Corion
[ambrus]: But either of these is a big problem in practice, so I'd need to spend like thirty years of my life to solve (a) and five more years to solve (b)
[ambrus]: Corion: yes, CURRENTLY the blackboard is more convenient
[ambrus]: and it's not like I want to ban blackboards anyway
[ambrus]: you already have blackboards and a canvas for projector or overhead transparencies (or positive film slide projector, not used for maths) canvas in the same lecture halls today, and switch in a few minutes between presentations,
[ambrus]: they're only difficult to use together.
[ambrus]: overhead transparencies are a nice convenience by the way that mix the two slides, because you can write them in advance and edit them during the presentation easily. but they're not very much in fasion these days.
[ambrus]: you can even print them.
[Discipulus]: ambrus i'm trying out MremoteNG which wrap putty and rdp and many other things..
[Corion]: ambrus: Yes, ideally you would have the ease of overhead projection transparencies and pens drawing on them, combined with the computer generated slide text...
[Corion]: Maybe the solution would be a tablet (with pens), like the Wacom tablets, but you still need good software and need to know how to operate it well in an interactive setting ;)

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2017-09-26 10:28 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (293 votes). Check out past polls.