Match only certain characters?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Match only certain characters? by Cristoforo (Curate) on Nov 08, 2009 at 23:44 UTC
One way to do it, (using `tr`). `#!/usr/bin/perl use strict; use warnings; my $s='ABCAAAABBBCCCCCAAABBBCCC'; if (0 == $s =~ tr/ABC//c) { print "All letters are A B or C"; }` [download] Note that I didn't use $a or $b as those should be reserved for the sort function. Chris	[reply] [d/l] [select]
Re: Match only certain characters? by keszler (Priest) on Nov 08, 2009 at 23:40 UTC
`use strict; my $a='ABCAAAABBBCCCCCAAABBBCCC'; my $b='ASRGRTGRT89579843rrrrrr'; print '$a matches',$/ if $a =~ /^[ABC]+$/; print '$b matches',$/ if $b =~ /^[ABC]+$/;` [download] Update: The regex is anchored at the beginning and end of the string, and requires 1 or more of character class [ABC]. perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^[A +BC]+$/)->explain;' The regular expression: (?-imsx:^[ABC]+$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- [ABC]+ any character of: 'A', 'B', 'C' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] Update2: Of course, this is not at all efficient, checking every character in the string to ensure it is one that you want. The responses below are much more efficient.	[reply] [d/l] [select]
Re: Match only certain characters? by AnomalousMonk (Archbishop) on Nov 09, 2009 at 00:50 UTC
... you wanted to pick only the string that contains A, B and* C ...* The question in the OP is not entirely clear; one can infer that at least one `'A'` and `'B'` and `'C'` must all be present to match. If that is so, this regex serves: >perl -wMstrict -le "my $gotA = qr{ (?= .? A) }xms; my $gotB = qr{ (?= .? B) }xms; my $gotC = qr{ (?= .? C) }xms; my $onlyABC = qr{ (?! .? [^ABC]) }xms; for my $s (@ARGV) { my $match = $s =~ m{ \A $gotA $gotB $gotC $onlyABC }xms; print qq{'$s' }, $match ? 'matches' : 'no match'; } " "" A B C AB AC BC ABC CAB ABABCBCBCACA xABC ABCx xA xB xC Ax '' no match 'A' no match 'B' no match 'C' no match 'AB' no match 'AC' no match 'BC' no match 'ABC' matches 'CAB' matches 'ABABCBCBCACA' matches 'xABC' no match 'ABCx' no match 'xA' no match 'xB' no match 'xC' no match 'Ax' no match [download]	[reply] [d/l] [select]
Re: Match only certain characters? by biohisham (Priest) on Nov 09, 2009 at 00:05 UTC
How about a positive look ahead assertion?, since you only wanted to pick the string that contained the pattern provided, you don't have to capture the match and instead it would only be sufficient if you lookedaround for its existence. Reading the documentation for Regexes to walk you through can be a good idea .. `#!/usr/local/bin/perl use strict; use warnings; #f = first, s = second. my $f='ABCAAAABBBCCCCCAAABBBCCC'; my $s='ASRGRTGRT89579843rrrrrr'; print $f=~/(?=^[ABC]+$)/? "Matches [ABC] Combinations\n": "No pattern +found!\n"; print $s=~/(?=^[ABC]+$)/? "Matches [ABC] Combinations\n": "No pattern +found!\n";` [download] `#OUTPUT: Matches [ABC] Combinations No pattern found` [download] Update: added a link that might benefit the OP. Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.	[reply] [d/l] [select]
Re: Match only certain characters? by Marshall (Canon) on Nov 09, 2009 at 01:43 UTC
When I first looked at this, I came up with something similar to what Cristoforo did. `#!/usr/bin/perl -w use strict; my @strings = ('ABCAAAABBBCCCCCAAABBBCCC', 'ASRGRTGRT89579843rrrrrr', 'A98797BqrtoiquyrtoCafdgagfd'); + foreach (@strings) { # Here the scalar count value of tr is used # tr is very fast but lacks the flexibility of # regex # prints the string unless it has some char that is # not an A, B, C. print "$_\n" if !(tr/ABC//c); } # Prints: ABCAAAABBBCCCCCAAABBBCCC` [download] This is a great idea if you know that ABC are the things in advance. If we have a "standard" string by which others will be compared, and that string is a variable. We we are going compare that "standard" string against many lines, and create a string with the unique chars in the "standard" string and use that string in a simple regex to see if they are there. No fancy look ahead required. my $standard='ABCAAAABBBCCCCCAAABBBCCC'; my %seen; my @unique_letters = grep{$seen{$_}++ <1 } split(//,$standard); my $unique_letters = join("",@unique_letters); # these above two lines could be combined, but I think it # reads better this way, and YES, it is completely legal in Perl # to have an array variable named unique_letters and a # string named the same thing. Update: with same name not same "thing" +. foreach (@strings) { print "$_\n" if (/[^$unique_letters]/); } #prints: #ASRGRTGRT89579843rrrrrr #A98797BqrtoiquyrtoCafdgagfd #change to "not" of these: (!/[^$unique_letters]/); #to get: ABCAAAABBBCCCCCAAABBBCCC [download]	[reply] [d/l] [select]
Re^2: Match only certain characters? by AnomalousMonk (Archbishop) on Nov 09, 2009 at 07:23 UTC
The steps taken to 'uniqify' the characters of the character set are not necessary; repeated characters in a regex character set have no effect on pattern recognition. (I also think repeated characters make no difference in the execution time of the regex, but I cannot come up with a reference on this at the moment.) However, repeated characters do seem to take up space in the regex object. `>perl -wMstrict -le "my $standard = 'ABCAAAABBBCCCCCAAABBBCCC'; foreach (@ARGV) { my $non_standard = /[^$standard]/; print qq{'$_' }, $non_standard ? 'no match' : 'match'; } " "" A B C ABC ABCABCABC xA Ax xABC ABCx '' match 'A' match 'B' match 'C' match 'ABC' match 'ABCABCABC' match 'xA' no match 'Ax' no match 'xABC' no match 'ABCx' no match` [download]	[reply] [d/l]
Re: Match only certain characters? by EvanCarroll (Chaplain) on Nov 08, 2009 at 23:43 UTC
this will look for non-provided letters and stop on the first non provided letter. `$_="ABC"; say /[^ABC]/ ? "other (non-ABC) letters too" : "only provided letters"` [download] Evan Carroll The most respected person in the whole perl community. www.evancarroll.com	[reply] [d/l]
Re: Match only certain characters? by jffry (Hermit) on Nov 09, 2009 at 08:03 UTC
It seems to me that you are asking for something as simple as this. `my @ss = ( 'A', 'B', 'C', 'AB', 'CA', 'ABC', 'xABCx', 'BxCxAx', ); for my $s (@ss) { if ($s =~ /A/ && $s =~ /B/ && $s =~ /C/) { print "$s\n"; } }` [download] Which outputs: `ABC xABCx BxCxAx` [download] *EDIT:* This response is only to the first part of the OP's question. That is: ...if you have the strings: $a='ABCAAAABBBCCCCCAAABBBCCC'; $b='ASRGRTGRT89579843rrrrrr'; and you wanted to pick only the string that contains A, B and C (namely $a)...	[reply] [d/l] [select]
Re^2: Match only certain characters? by AnomalousMonk (Archbishop) on Nov 09, 2009 at 18:18 UTC
But the OP asked for... ... a way to search only for the string that has the letters A,B,C and no other letter ... which clearly excludes a string like `'xABCx'`.	[reply] [d/l]
Re^3: Match only certain characters? by jffry (Hermit) on Nov 09, 2009 at 22:05 UTC
In that case, I'd have to do this, then. `my @ss = qw( A B C AB CA ABC xABCx BxCxAx ); for (@ss) { if (/A/ && /B/ && /C/ && !/[^ABC]/) { print "$_\n"; } }` [download]	[reply] [d/l]


more useful options
	PerlMonks