Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Match only certain characters?

by Anonymous Monk
on Nov 08, 2009 at 23:32 UTC ( [id://805819]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks!
I wanted to ask if there is a way of checking if a string has only certain characters.
For example, if you have the strings:
$a='ABCAAAABBBCCCCCAAABBBCCC'; $b='ASRGRTGRT89579843rrrrrr';
and you wanted to pick only the string that contains A, B and C (namely $a), what must you insert in the pattern match? Is there a way to search only for the string that has the letters A,B,C and no other letter?
Thank you!

Replies are listed 'Best First'.
Re: Match only certain characters?
by Cristoforo (Curate) on Nov 08, 2009 at 23:44 UTC
    One way to do it, (using tr).

    #!/usr/bin/perl use strict; use warnings; my $s='ABCAAAABBBCCCCCAAABBBCCC'; if (0 == $s =~ tr/ABC//c) { print "All letters are A B or C"; }
    Note that I didn't use $a or $b as those should be reserved for the sort function.

    Chris

Re: Match only certain characters?
by keszler (Priest) on Nov 08, 2009 at 23:40 UTC
    use strict; my $a='ABCAAAABBBCCCCCAAABBBCCC'; my $b='ASRGRTGRT89579843rrrrrr'; print '$a matches',$/ if $a =~ /^[ABC]+$/; print '$b matches',$/ if $b =~ /^[ABC]+$/;

    Update: The regex is anchored at the beginning and end of the string, and requires 1 or more of character class [ABC].

    perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^[A +BC]+$/)->explain;' The regular expression: (?-imsx:^[ABC]+$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- [ABC]+ any character of: 'A', 'B', 'C' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    Update2:

    Of course, this is not at all efficient, checking every character in the string to ensure it is one that you want. The responses below are much more efficient.

Re: Match only certain characters?
by AnomalousMonk (Archbishop) on Nov 09, 2009 at 00:50 UTC
    ... you wanted to pick only the string that contains A, B and C ...
    The question in the OP is not entirely clear; one can infer that at least one  'A' and  'B' and  'C' must all be present to match.

    If that is so, this regex serves:

    >perl -wMstrict -le "my $gotA = qr{ (?= .*? A) }xms; my $gotB = qr{ (?= .*? B) }xms; my $gotC = qr{ (?= .*? C) }xms; my $onlyABC = qr{ (?! .*? [^ABC]) }xms; for my $s (@ARGV) { my $match = $s =~ m{ \A $gotA $gotB $gotC $onlyABC }xms; print qq{'$s' }, $match ? 'matches' : 'no match'; } " "" A B C AB AC BC ABC CAB ABABCBCBCACA xABC ABCx xA xB xC Ax '' no match 'A' no match 'B' no match 'C' no match 'AB' no match 'AC' no match 'BC' no match 'ABC' matches 'CAB' matches 'ABABCBCBCACA' matches 'xABC' no match 'ABCx' no match 'xA' no match 'xB' no match 'xC' no match 'Ax' no match
Re: Match only certain characters?
by biohisham (Priest) on Nov 09, 2009 at 00:05 UTC
    How about a positive look ahead assertion?, since you only wanted to pick the string that contained the pattern provided, you don't have to capture the match and instead it would only be sufficient if you lookedaround for its existence. Reading the documentation for Regexes to walk you through can be a good idea ..
    #!/usr/local/bin/perl use strict; use warnings; #f = first, s = second. my $f='ABCAAAABBBCCCCCAAABBBCCC'; my $s='ASRGRTGRT89579843rrrrrr'; print $f=~/(?=^[ABC]+$)/? "Matches [ABC] Combinations\n": "No pattern +found!\n"; print $s=~/(?=^[ABC]+$)/? "Matches [ABC] Combinations\n": "No pattern +found!\n";
    #OUTPUT: Matches [ABC] Combinations No pattern found
    Update: added a link that might benefit the OP.


    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.
Re: Match only certain characters?
by Marshall (Canon) on Nov 09, 2009 at 01:43 UTC
    When I first looked at this, I came up with something similar to what Cristoforo did.

    #!/usr/bin/perl -w use strict; my @strings = ('ABCAAAABBBCCCCCAAABBBCCC', 'ASRGRTGRT89579843rrrrrr', 'A98797BqrtoiquyrtoCafdgagfd'); + foreach (@strings) { # Here the scalar count value of tr is used # tr is very fast but lacks the flexibility of # regex # prints the string unless it has some char that is # not an A, B, C. print "$_\n" if !(tr/ABC//c); } # Prints: ABCAAAABBBCCCCCAAABBBCCC
    This is a great idea if you know that ABC are the things in advance.

    If we have a "standard" string by which others will be compared, and that string is a variable. We we are going compare that "standard" string against many lines, and create a string with the unique chars in the "standard" string and use that string in a simple regex to see if they are there. No fancy look ahead required.

    my $standard='ABCAAAABBBCCCCCAAABBBCCC'; my %seen; my @unique_letters = grep{$seen{$_}++ <1 } split(//,$standard); my $unique_letters = join("",@unique_letters); # these above two lines could be combined, but I think it # reads better this way, and YES, it is completely legal in Perl # to have an array variable named unique_letters and a # string named the same thing. Update: with same name not same "thing" +. foreach (@strings) { print "$_\n" if (/[^$unique_letters]/); } #prints: #ASRGRTGRT89579843rrrrrr #A98797BqrtoiquyrtoCafdgagfd #change to "not" of these: (!/[^$unique_letters]/); #to get: ABCAAAABBBCCCCCAAABBBCCC
      The steps taken to 'uniqify' the characters of the character set are not necessary; repeated characters in a regex character set have no effect on pattern recognition. (I also think repeated characters make no difference in the execution time of the regex, but I cannot come up with a reference on this at the moment.) However, repeated characters do seem to take up space in the regex object.
      >perl -wMstrict -le "my $standard = 'ABCAAAABBBCCCCCAAABBBCCC'; foreach (@ARGV) { my $non_standard = /[^$standard]/; print qq{'$_' }, $non_standard ? 'no match' : 'match'; } " "" A B C ABC ABCABCABC xA Ax xABC ABCx '' match 'A' match 'B' match 'C' match 'ABC' match 'ABCABCABC' match 'xA' no match 'Ax' no match 'xABC' no match 'ABCx' no match
Re: Match only certain characters?
by EvanCarroll (Chaplain) on Nov 08, 2009 at 23:43 UTC
    this will look for non-provided letters and stop on the first non provided letter.
    $_="ABC"; say /[^ABC]/ ? "other (non-ABC) letters too" : "only provided letters"


    Evan Carroll
    The most respected person in the whole perl community.
    www.evancarroll.com
Re: Match only certain characters?
by jffry (Hermit) on Nov 09, 2009 at 08:03 UTC

    It seems to me that you are asking for something as simple as this.

    my @ss = ( 'A', 'B', 'C', 'AB', 'CA', 'ABC', 'xABCx', 'BxCxAx', ); for my $s (@ss) { if ($s =~ /A/ && $s =~ /B/ && $s =~ /C/) { print "$s\n"; } }

    Which outputs:

    ABC xABCx BxCxAx

    EDIT: This response is only to the first part of the OP's question. That is:

    ...if you have the strings:

    $a='ABCAAAABBBCCCCCAAABBBCCC';
    $b='ASRGRTGRT89579843rrrrrr';

    and you wanted to pick only the string that contains A, B and C (namely $a)...
      But the OP asked for...
      ... a way to search only for the string that has the letters A,B,C and no other letter ...
      which clearly excludes a string like 'xABCx'.

        In that case, I'd have to do this, then.

        my @ss = qw( A B C AB CA ABC xABCx BxCxAx ); for (@ss) { if (/A/ && /B/ && /C/ && !/[^ABC]/) { print "$_\n"; } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://805819]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-19 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found