Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.

by isha (Sexton)
on Dec 11, 2007 at 04:48 UTC ( #656310=perlquestion: print w/ replies, xml ) Need Help??
isha has asked for the wisdom of the Perl Monks concerning the following question:

Can anybody help to have a regular expression that checks that a,b and c occur in a string in any order.
SO, the valid strings are abc, bca, cab, cba, bac, acb...
Invalid strings are aab, abbc, acc etc.

Comment on REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.
Re: REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.
by narainhere (Monk) on Dec 11, 2007 at 04:55 UTC
    Thanks for making me think about regex in yet another way!! Here is a easy to understand code
    use strict; use warnings; my $str=""; for $str('aaa'..'ccc') { if($str=~m/^[abc]{3}$/) { if($str!~m/(.*a.*a)/) { if($str!~m/(.*b.*b)/) { if($str!~m/(.*c.*c)/) { print "\' $str \' is a Match\n"; } else { print "The string $str contains duplicated 'c'\n"; } } else { print "The string $str contains duplicated 'b'\n"; } } else { print "The string $str contains duplicated 'a'\n"; } } else { print "Either the string \'$str\' contains a character not bel +onging to the class [abc] or is more than 3 characters\n"; } }
    which can be condensed as
    use strict; use warnings; my $str=""; for $str('aaa'..'ccc') { if($str=~m/^[abc]{3}$/ && $str!~m/(.*a.*a)/ && $str !~ m/(.*b.*b)/ + && $str!~ m/(.*c.*c)/) { print "\' $str \' is a Match\n"; } }

    UPDATE: I overlooked the complexity of the problem.Thanks to Punitha,merlyn,KurtSchwind for their inputs

    The world is so big for any individual to conquer

      Update: Nicely re-worked. Previously it didn't work but now it does. Nice job. I'm leaving my initial response here anyway, but it's no longer pertinent.

      That doesn't work because it incorrectly matches invalid strings.

      If you change

      my $str="abc";
      to
      my $str="aaa";
      You'll still get a match even though that's not supposed to be a valid string. You'll have to go a bit more complicated in the regex to exclude the false hits.

      Update: I was missing my 2nd code snip.
      --
      I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.
Re: REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.
by kyle (Abbot) on Dec 11, 2007 at 05:05 UTC
    use Test::More; my @match_wanted = qw( abc bca cab cba bac acb ); my @match_unwanted = qw( aab abbc acc ); sub match { my $s = shift; return ( ($s =~ /(?:[abc].*){3}/) && ! ($s =~ /([abc]).*\1/) ); } plan 'tests' => ( scalar @match_wanted + scalar @match_unwanted ); ok( match( $_ ), "matches '$_'" ) for @match_wanted; ok( ! match( $_ ), "does not match '$_'" ) for @match_unwanted;
Re: REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.
by BrowserUk (Pope) on Dec 11, 2007 at 05:09 UTC
    m/ (?=^[^a]*a[^a]*$) ## One a and no other a's (?=^[^b]*b[^b]*$) ## One b and no other b's (?=^[^c]*c[^c]*$) ## One c and no other c's ^.{3}$ ## Exactly three chars /x and print "$_: matched" for 'aaa'..'ccc';; abc: matched acb: matched bac: matched bca: matched cab: matched cba: matched

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      This looks like overkill.

      If a string is required to contain all of an "a", a "b" and a "c" and its length is limited to 3, then it can only contain one of each.

      So this ought to suffice:

      /^(?=.*a)(?=.*b)(?=.*c).{3}\z/

      Of course, yours is the way to go if the length is not limited to 3.

Re: REgular expression to check the string that allows "a","b" and "c" to occur only once in any order. (permutations)
by bobf (Monsignor) on Dec 11, 2007 at 05:47 UTC

    I was going to create a nasty little test using length, tr///s, m//, and a couple of evals, but I just couldn't bring myself to complete it. Instead, I reached for CPAN and Math::Combinatorics.

    use strict; use warnings; use Math::Combinatorics qw( permute ); use Test::More; my @match_wanted = qw( abc bca cab cba bac acb ); my @match_unwanted = qw( aab abbc acc ); my @charset = qw( a b c ); my %valid = map { join( '', @$_ ) => 1 } permute( @charset ); sub match { my $s = shift; return( exists $valid{$s} ? 1 : 0 ); } plan 'tests' => ( scalar @match_wanted + scalar @match_unwanted ); ok( match( $_ ), "matches '$_'" ) for @match_wanted; ok( ! match( $_ ), "does not match '$_'" ) for @match_unwanted;
    Output:
    1..9 ok 1 - matches 'abc' ok 2 - matches 'bca' ok 3 - matches 'cab' ok 4 - matches 'cba' ok 5 - matches 'bac' ok 6 - matches 'acb' ok 7 - does not match 'aab' ok 8 - does not match 'abbc' ok 9 - does not match 'acc'

    Thanks and ++ to kyle for the framework.

      It fails when you add adbc to @match_wanted. Easily fixed:

      use Math::Combinatorics qw( permute ); my @charset = qw( a b c ); my %valid = map { join( '', @$_ ) => 1 } permute( @charset ); my ($others) = map qr/[^$_]/, join '', map quotemeta, @charset; sub match { my $s = shift; $s =~ s/$others//g; return $valid{$s}; }

      or

      use Math::Combinatorics qw( permute ); my @charset = qw( a b c ); my ($valid) = map qr/^(?:$_)\z/, join '|', map quotemeta, map join('', + @$_), permute @charset; my ($others) = map qr/[^$_]/, join '', map quotemeta, @charset; sub match { my $s = shift; $s =~ s/$others//g; return $s =~ $valid; }

      or

      use Math::Combinatorics qw( permute ); use Regexp::List qw( ); my @charset = qw( a b c ); my ($valid) = map qr/^$_\z/, Regexp::List->new()->list2re(permute(@cha +rset)); my ($others) = map qr/[^$_]/, join '', map quotemeta, @charset; sub match { my $s = shift; $s =~ s/$others//g; return $s =~ $valid; }

        True. The OP's spec wasn't clear as to whether or not the strings could contain other characters (or were limited to only those in the specified character set), or if a, b, and c were really characters and not stand-ins for longer strings. I went with Occam's razor and took the OP literally. (Update: Since the list of invalid strings contained only a, b, and c; I assumed if other characters were possible there would be an example of it in the invalid list. ikegami is right, though - this was probably more of an assumption than I first thought.)

        This approach could also be problematic if the character set is large and/or if the wanted strings were significantly longer, since the number of permutations would rapidly increase. In that case, a RE-based approach may be better.

        Nonetheless, thanks for providing a more robust solution.

Re: REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.
by parv (Priest) on Dec 11, 2007 at 06:48 UTC

    Note: Please ignore the code listed here as it is buggy inside the while loop. Sorry for the noise.

    First thing I thought of after reading OP was Algorithm::Permute module. (That led me to search for other permutations related modules. Now, I am more in favor of Math::Combinatorics and Algorithm::Combinatorics.)

    I did not spend much time on thinking of a regex as that would have been hairy, though I liked BrowserUk's regex (which could get unwieldy if needs to be modified).

    Below is the version which checks for overlapping strings ("abcXa" is ok, "abca" not). As it is, the program below uses &A::C::permutations. Feel free to exercise any of the other two modules. Basically, the big idea is ...

    sub find_once { my ( $str , ... ) = @_ ; ... my ( $pos , $once ) = ( 0 ) x2; while ( my $p = get_permutation_string() ) { last if $once > 1; ( $pos = index $str , $p , $pos ) >= 0 and $once++ ; } return $once == 1; }

    Update: I just noticed that "once" specification is in the title, but missing from the body.

      The assignment of $pos has a bug in ...

      ( $pos = index $str , $p , $pos ) >= 0 and $once++ ;

      ... which caused index() to some times start from -1, not ot mention $pos was not being reset to 0. Please ignore my previous reply.

Re: REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.
by oha (Friar) on Dec 11, 2007 at 10:12 UTC
    print "ok!\n" unless /([abc]).*\1/;
    the RE above checks if one of [abc] is repeated

    if you want to declare "abcz" wrong too, just:

    print "ok!\n" unless /(([abc]).*\2|[^abc])/;
    which also check if a bad char is present.

    Oha

Re: REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.
by inman (Curate) on Dec 11, 2007 at 12:35 UTC
    I'm not sure whether this is in the rules but it does work. The code below sorts the data before matching. This makes the match fairly trivial since you need to look for the string abc without a an extra "a" or "c".
    #! /usr/bin/perl -w use strict; my @testdata = qw (abc cba cab bac bba cbc bbc abcdefg 0a2c3bc); foreach (@testdata) { my $test = join '', sort split //, lc $_; $test =~ /[^a]*abc(?!c)[^c]*/ ? print "Matched $_\n" : print "Didn +'t match $_\n"; }
      Interesting approach for which ++.

      .oO, however, that this allows cases such as when @testdata contains aabc. IMO that's NOT quite1 (strictly) prohibited by the (very loose) specs from OP:

      Can anybody help to have a regular expression that checks that a,b and c occur in a string in any order.
      SO, the valid strings are abc, bca, cab, cba, bac, acb...
      Invalid strings are aab, abbc, acc etc.

      BUT the presence of abbc in the list of invalid strings suggests that isha may want to requires that each permitted letter occur only once. (Others may well infer differently. As noted, I consider the spec to be imprecise... as is, ISTM, the general case when specs are written by example.)

      1. because, in this case, while the initial "a" is doubled the $test does include an instance of each acceptable character; in other words, ISTM that aabc is NOT - strictly - prohibited in the sample of invalid strings, aab because aab can be read as prohibiting only an incomplete set of the chars a, b and c, while the invalid string abbc does not strictly prohibit having the lead character doubled. Am I being pedantic or fuzzy-brained?

      I do find my own observations to fall short of precision+clarity. Maybe it's too early in the AM... :-)

      #! /usr/bin/perl -w use strict; my @testdata = qw (abc aabc cba cab bac bba cbc bbc abcdefg 0a2c3bc); foreach (@testdata) { my $test = join '', sort split //, lc $_; $test =~ /[^a]*abc(?!c)[^c]*/ ? print "Matched $_\n" : print "Didn +'t match $_\n";

      Output:
      Matched abc
      Matched aabc     <=
      Matched cba
      ....

      Update: added the missing close-paren at the end of para 3

Re: REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.
by MidLifeXis (Prior) on Dec 11, 2007 at 18:22 UTC

    Any chance of homework here? Probably not, looking at rest of OP history, but a good HW question.

    --MidLifeXis

Re: REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.
by snoopy (Deacon) on Dec 11, 2007 at 21:56 UTC
    Here's a solution based on a hash/regex combo:
    my $chars = "abc"; foreach (qw/abc bca cab cba bac acb aab abbc acc/) { my %seen; my $pat = $_; $pat =~ s/([\Q$chars\E])/++$seen{$1}/gex; if ((grep {$_ == 1} values %seen) == length($chars)) { print "$_ matches\n"; } else{ print "$_ doesn't match\n"; } }
Re: REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.
by AK108 (Friar) on Dec 12, 2007 at 13:11 UTC
    This isn't a regex, but I don't think anyone mentioned it.
    my @letters = qw(abc cba cab bab blab); for (@letters) { if(join('', sort split //, $_) eq "abc") {print "$_ matched\n"} else {print "$_ didn't match\n"} }
    (My initial thought was to add up the ord() values of the characters. That would work too, but wouldn't be as concise since there isn't a reduce operator in Perl 5.)

      That is simple!

      A problem might arise if the desired letters could be part of a bigger string. Lacking any comments by isha on the specification :( so far, your solution is just as good.

Re: REgular expression to check the string that allows "a","b" and "c" to occur only once in any order.
by thundergnat (Deacon) on Dec 12, 2007 at 19:30 UTC

    Does you need to use a regex?

    #!/usr/bin/perl use warnings; use strict; my @maybe = qw( abc bca cab cba bac acb aab abbc acc ); print "$_ ", ok($_) ? 'good' : 'no good', "\n" for @maybe; sub ok{ return y/a// > 1 ? 0 : y/b// > 1 ? 0 : y/c// > 1 ? 0 : 1; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://656310]
Approved by kyle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2014-09-23 07:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (210 votes), past polls