Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Double check for positions

by bisimen (Acolyte)
on Oct 18, 2017 at 12:30 UTC ( [id://1201579]=perlquestion: print w/replies, xml ) Need Help??

bisimen has asked for the wisdom of the Perl Monks concerning the following question:

This sub works, but say this is my DNA: AGCTTCTTGCGCTTCTT and this is what i'm looking for: GCTTCTTGC

It will return 2. Which it correct, but since the next match blends into the first one, it dosen't detect it.

So it should return: 2 9

I guess I need to change the regular expression somehow...

sub match_positions { my ($regexp, $sequence) = @_; use strict; my @positions = ( ); while ( $sequence =~ m/$regexp/ig ) { push ( @positions, pos($sequence) - length($&) + 1 ); } return "@positions "; }

Thanks for any answers!

Replies are listed 'Best First'.
Re: Double check for positions
by choroba (Cardinal) on Oct 18, 2017 at 13:17 UTC
    To match overlapping regexes, use a look-ahead assertion:
    my $s = 'AGCTTCTTGCTTCTTGC'; # ~~~~~~~~~ # ~~~~~~~~~ while ($s =~ /(?=GCTTCTTGC)/g) { say pos $s; }

    There's something wrong with your input, though, there's no overlap and it matches just once even with the look-ahead. I modified it as shown in the sample.

    Another possibility is to use index:

    my $s = 'AGCTTCTTGCTTCTTGC'; my $pos = -1; while (-1 != ($pos = index $s, 'GCTTCTTGC', $pos + 1)) { say $pos; }
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Double check for positions
by hippo (Bishop) on Oct 18, 2017 at 12:50 UTC
    since the next match blends into the first one

    Actually it doesn't.

    use strict; use warnings; use Test::More tests => 2; my $pos = match_positions ('GCTTCTTGC', 'AGCTTCTTGCGCTTCTT'); is ($pos, '2 9 ', 'Matched'); # This fails because you have only one o +ccurrence. $pos = match_positions ('GCG', 'AGCGCGT'); is ($pos, '2 4 ', 'Matched'); sub match_positions { my ($regexp, $sequence) = @_; my @positions = ( ); while ( $sequence =~ m/$regexp/ig ) { my $newpos = pos($sequence) - length($&) + 1; push @positions, $newpos; pos($sequence) = $newpos; } return "@positions "; }

    Update: fixed the test count to include the second easier-to-follow example.

      This works, thanks!

      But, what do you mean by "I only have one occurrence"? Or explain it a little bit deeper...

      And, this line:

      pos($sequence) = $newpos;
      Why is it needed?

        I mean that your search pattern GCTTCTTGC only appears once in AGCTTCTTGCGCTTCTT. If you think it appears twice, you are mistaken.

        The pos reset is needed in case there are overlapping pattern matches (which is what I've taken your question to be about, despite the sample data). In that case the pos marker needs to be backtracked to one place after the start of the last match to cope with that.

Re: Double check for positions
by tybalt89 (Monsignor) on Oct 18, 2017 at 13:16 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1201579 use strict; use warnings; my $haystack = 'AGCTTCTTGCTTCTTGC'; my $needle = 'GCTTCTTGC'; my @positions; push @positions, $-[0] + 1 while $haystack =~ /(?=$needle)/g; print "@positions\n";
Re: Double check for positions
by 1nickt (Canon) on Oct 18, 2017 at 12:41 UTC

    Hi, that's a FAQ:

    $ perldoc -q substring Found in /home/nick/perl5/perlbrew/perls/perl-5.26.1/lib/5.26.1/pod/pe +rlfaq4.pod How can I count the number of occurrences of a substring within a st +ring? ...


    The way forward always starts with a minimal test.
      I don't think this is what I'm looking for...
Re: Double check for positions
by Anonymous Monk on Oct 18, 2017 at 16:08 UTC
    The other monks have the regex solution covered, but here's the non-regex version.
    my $str = 'CTTCTTCTTCTT'; my $pat = 'CTTCTT'; my $pos = 0; while (($pos = index($str, $pat, $pos)) >= 0) { print $pos, "\n"; $pos++; } __END__ 0 3 6

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1201579]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-03-28 16:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found