Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^2: string match using with an N in any position

by AnomalousMonk (Monsignor)
on Nov 18, 2011 at 22:54 UTC ( #938922=note: print w/ replies, xml ) Need Help??


in reply to Re: string match using with an N in any position
in thread string match using with an N in any position

The OP seems to want to match at the start of the target string with either

  • an exact match to the query string, or
  • a match to a string formed by replacing a single character anywhere in the query string with a single 'N'.
The code of Re: string match using with an N in any position matches 'CACGT' against 'CBCGTNNN' ('B' vice 'N').

>perl -wMstrict -le "my @queries = qw[ GCGAT CACGT ]; ;; my @targets = qw(GNGATNNN GCGANBBB CNCGTNNN CBCGTNNN ); ;; for my $q (@queries) { for my $t (@targets) { my $matched = ($q ^ substr($t, 0, length $q)) =~ tr[\0][\0]; if($matched >= (length($q) - 1)) { print qq{'$q' matched '$t'}; } } } " 'GCGAT' matched 'GNGATNNN' 'GCGAT' matched 'GCGANBBB' 'CACGT' matched 'CNCGTNNN' 'CACGT' matched 'CBCGTNNN'

Here's a variation that avoids this (although the conditional logic is a bit obscure).

>perl -wMstrict -le "use List::MoreUtils qw(uniq); ;; my @queries = qw(GCGAT CACGTT); ;; my $n_diff = join '', uniq map { sprintf '\x%02x', ord($_ ^ 'N') } map { split // } @queries ; $n_diff = eval qq{ sub { return \$_[0] =~ tr/$n_diff/$n_diff/; } }; ;; my @targets = qw( GNGATNNNHIT GCGANBBBHIT CNCGTTNNNHIT CACGTTNNNHIT CBCGTTNNNMISS CNNGTTNNNMISS NCACGTTNNNMISS ); ;; for my $q (@queries) { my $len_q = length $q; TARGET: for my $t (@targets) { my $mask = $q ^ substr $t, 0, $len_q; my $nulls = $mask =~ tr{\0}{\0}; next TARGET if $len_q > $nulls + 1 or $len_q > $nulls && $n_diff->($mask) != 1 ; print qq{'$q' matched '$t'}; } } " 'GCGAT' matched 'GNGATNNNHIT' 'GCGAT' matched 'GCGANBBBHIT' 'CACGTT' matched 'CNCGTTNNNHIT' 'CACGTT' matched 'CACGTTNNNHIT'


Comment on Re^2: string match using with an N in any position
Select or Download Code
Re^3: string match using with an N in any position
by BrowserUk (Pope) on Nov 18, 2011 at 23:48 UTC
    The code of Re: string match using with an N in any position matches 'CACGT' against 'CBCGTNNN' ('B' vice 'N').

    Agreed. But in the genomic encoding scheme of things, the 'N' means 'aNy'. Whereas 'B' means 'any except A'.

    With my very limited understanding, 'N' therefore encompasses 'B' in as much as there is no mention in his post of excluding strings that have an 'A' in the wild-card position. Nor is there any mention in the post of the possibility of "targets"(*) ever contains 'B's in the relevant positions.

    In this case, the OP seems satisfied with the solution for his particular problem. I'll leave it up to him to know his data and problem domain.

    (*An unusual term in this context -- the wild-cards are usually in the query -- but whatever :)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      ... in ... genomic encoding ... 'N' means 'aNy' [w]hereas 'B' means 'any except A'.

      I did not know that.

      With my very limited understanding...

      Well, your understanding is nowhere near as limited as mine and, as you say, the OPer seems happy, so...

Re^3: string match using with an N in any position
by BrowserUk (Pope) on Nov 19, 2011 at 08:07 UTC

    If the possibility your describe can happen, I think this is a computationally simpler solution:

    #! perl -slw use strict; my @queries = qw[ GCGAT CACGT ]; chomp( my @targets = <DATA> ); for my $q ( @queries ) { for my $t ( @targets ) { my $matched = ( $q ^ substr( $t, 0, length( $q ) ) ) =~ tr[\0] +[\0]; if( $matched == length( $q ) or $matched == length( $q )-1 and substr( $t, 0, length( $q ) ) =~ tr[N][N] == 1 ) { print "$q matched $t"; } } } __DATA__ GNGATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN GCGANBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB +BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB CNCGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN GBGATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://938922]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2014-08-30 03:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (291 votes), past polls