Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Finding Neighbours of a String

by Aristotle (Chancellor)
on Mar 01, 2006 at 10:04 UTC ( [id://533607]=note: print w/replies, xml ) Need Help??


in reply to Finding Neighbours of a String

use Algorithm::Combinatorics qw( combinations ); use Set::CrossProduct; my @base = split //, $str; for my $exact_distance ( 1 .. $d ) { my $change_idx_iter = combinations( [ 0 .. $#base ], $exact_distan +ce ); while( my $change_idx = $change_idx_iter->next ) { my @base_combo = map { my $i = $_; [ grep { $base[ $i ] ne $_ } qw( A T C G ) ]; } @$change_idx; # HACK: Set::CrossProduct doesn’t work with a 1-dimensional ma +trix push @base_combo, [ 0 ] if $exact_distance == 1; my $bases_iter = Set::CrossProduct->new( \@base_combo ); my @neighbour = @base; while( my $new_bases = $bases_iter->get ) { @neighbour[ @$change_idx ] = @$new_bases; print @neighbour, "\n"; } } }

Updates: changed to use combinations vs variations and to grep out non-changes, so that it will produce no duplicates.

To see what’s going on, add the following line before the print:

$_ = "[$_]" for @neighbour[ @$change_idx ];

Makeshifts last the longest.

Replies are listed 'Best First'.
Re^2: Finding Neighbours of a String
by monkfan (Curate) on Mar 01, 2006 at 11:25 UTC
    Dear Aristotle,

    Sorry for coming back to you again. How can I extend/modify your invaluable code above so that it can handle ambiguous string like this:
    $str = '[TCG]TTCG[AT]';
    The idea is exactly the same as my OP, only this time those characters under brackets [] is also considered as mismatch possibilities.

    Please kindly keep, your original answer.

    Regards,
    Edward

      Ah, thanks for your clarification. That’s easy: change the split line to

      my @base = $str =~ /\G ( \[ [^][]+ \] | [^][] ) /xg;

      which will parse the string into units of either a single letter or a bracketed sequence, and change the grep line to

      [ grep { $base[ $i ] !~ $_ } qw( A T C G ) ];

      so that the letter in question will be thrown out if it matches anywhere in a bracketed sequence.

      That’s all, you’re done.

      Makeshifts last the longest.

        Dear Aristotle,

        Sorry slight glitches here. I was working on your last modified code below. It works 99% fine except when the given string is in bracketed format.
        #!/usr/bin/perl -w use strict; use Data::Dumper; use Carp; use Algorithm::Combinatorics qw( combinations ); use Set::CrossProduct; my $str1 = '[TA]TTCGG'; my $e = 2; find_nb($str1,$e); sub find_nb { my ( $str, $d ) = @_; my @base = $str =~ /\G ( \[ [^][]+ \] | [^][] ) /xg; #my @base = split //, $str; for my $exact_distance ( 1 .. $d ) { my $change_idx_iter = combinations( [ 0 .. $#base ], $exact_distance ); while ( my $change_idx = $change_idx_iter->next ) { my @base_combo = map { my $i = $_; [ grep { $base[$i] !~ $_ } qw( A T C G ) ]; #[ grep { $base[$i] ne $_ } qw( A T C G ) ]; } @$change_idx; push @base_combo, [0] if $exact_distance == 1; my $bases_iter = Set::CrossProduct->new( \@base_combo ); my @neighbour = @base; while ( my $new_bases = $bases_iter->get ) { @neighbour[@$change_idx] = @$new_bases; #$_ = "[$_]" for @neighbour[@$change_idx]; my $str = join( "", @neighbour ); print "$str\n"; } } } return; }
        Why my modification above it doesnt' produce this: So the output should be always without bracket. Currently one of the entry appear like this: [TA]TTTTG. Instead this kind of string would need to be represented separately into:
        TTTTTG ATTTTG
        Is there anything I can do to fix it? I really hope to hear from you again. Since your solution is very important to me.

        Here is my brute-force code that generate the result above.

        Regards,
        Edward

      What exactly do you mean by “also considered as mismatch possibilities?”

      Makeshifts last the longest.

        This is one the example:
        # Both strings and candidate are always # the same length $str = '[TCG]TTCG[AT]'; $candidate1 = ' T TTCG G'; # I manually aligned this # The number of mismatch of those string would be: 1 # Namely the only last position gives a mismatch, # The first position it is considered a match.

        Regards,
        Edward

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://533607]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-25 06:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found