Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

stupid/simple mistake

by gogoglou (Beadle)
on Oct 18, 2011 at 12:38 UTC ( #932126=perlquestion: print w/ replies, xml ) Need Help??
gogoglou has asked for the wisdom of the Perl Monks concerning the following question:

Dear perl monks I have the following code , and the result that I would expect is 1 while it should be 3. Any idea what I am doing wrong? Also is there any easy way to count the letters between each time it finds the substring and store them in an array? Thanks in advance for your help.

#!/usr/bin/perl use warnings; use strict; open (FILE,"sequence.txt"); my $substring = 'GATC'; my$i=0; my$count; my $sequence; while ($sequence=<FILE>){ foreach($sequence =~ /$substring/) { #print "malakas\n"; #print "$count\n"; } $count++; } print "There are $count negative numbers in the string";

here is whats in the sequence.txt file: GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC

Comment on stupid/simple mistake
Download Code
Re: stupid/simple mistake
by Caio (Acolyte) on Oct 18, 2011 at 12:50 UTC
    As i stand, i see the following, your code counts up to 1 match per line, so(I guess) that's why you are getting 1 instead of 3.

    As for counting the bases between the matches... I'd save the match positions, and the subtrac them to get the actual distances ;)

    Also, this node(Re: Exact string matching) might help to solve the 1 match per line problem, but it'll take longer =/

      thanks everyone for the replies. I am really not very experienced in perl, since I do not use it so much.

      As for counting the bases between the matches... I'd save the match positions, and the subtrac them to get the actual distances ;)

      I can use index to do that, but then I need to add +3 in the first one, and then I want to store them in an array, so in the end I can calculate the avg distance, by dividing the sum of the distances, by the count. any suggestions, how to do that?

      Thanks everyone

Re: stupid/simple mistake
by choroba (Abbot) on Oct 18, 2011 at 12:50 UTC
    You need the /g switch to your regex match:
    for ($sequence =~ /$substring/g) {
    and, of course, increment $count in the inner loop.
Re: stupid/simple mistake
by Ratazong (Prior) on Oct 18, 2011 at 12:51 UTC
    The $count++ is outside the foreach-loop ... so you are counting the lines of the file instead ... which seems to be 1.
Re: stupid/simple mistake
by toolic (Chancellor) on Oct 18, 2011 at 12:55 UTC
    To count the number of occurrences of GATC in your string (this is a FAQ):
    perldoc -q count
    use warnings; use strict; my $substring = 'GATC'; my $count = 0; my $sequence = 'GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC'; while ($sequence =~ /$substring/g) { $count++ } print "count = $count\n"; __END__ count = 3
    See also: How do I compose an effective node title?
      while ($sequence =~ /$substring/g) { $count++ }
      You never learned how to add numbers without using your fingers? ;-)

      I don't understand the urge to count one-by-one -- and you're not the first poster in this thread to do so.

      while (<FILE>) { $count += () = /$substring/g; }
      ought to do it.
        #!/usr/bin/perl use warnings; use strict; open (FILE,"sequence.txt"); my $substring = 'GATC'; my$i=0; my$count; my $sequence; my$result; my $number=0; my $string; my $offset = 0; my @results; while ($sequence=<FILE>){ foreach($sequence =~ /$substring/g) { #print "malakas\n"; $count += () = /$substring/g; $number++; my $result = index($sequence, $substring, $offset); while ($result != -1) { push (@results, $result); print "Found $substring at $result\n"; $offset = $result + 1; $result = index($sequence, $substring, $offset); } #print "$count\n"; } #print "$result\n"; } foreach $result(@results){ my $sum } print "There are $count GATCs";

        ok now I have the positions into an array, but I need to calculate the distances between them, which is the sum of result(1)-result(0) + result(2)-result(1) +.... etc etc. I don't really know how to write that in code. any ideas? Thanks in advance again !

Re: stupid/simple mistake
by TomDLux (Vicar) on Oct 18, 2011 at 15:55 UTC
    #!/usr/bin/perl use warnings; use strict; use autodie; use List::Util; my $file = 'sequence.txt'; my $goal = 'GATC'; my $stats = process_file( $file, $goal ); # do something with stats ... # ---------------------------------------- # Subroutines # sub process_file { my ( $file, $goal ) = @_; open my $infh, '<', $file; my $stats = process_lines( $infh, $goal ); close $infh; return $stats } sub process_lines { my ( $infh, $goal ) = @_; my @stats; while ( my $line - <$infh> ) { chomp $line; my $linestats = process_one_line( $line, $goal ); push @stats, $linestats || 0; } return \@stats; } sub process_one_line { my ( $line, $goal ) = @_ my @occurences; my ( $offset ) = ( 0 ); SEEK: while ( 1 ) { $idx = index( $line, $goal, $offset ); last SEEK if $idx == -1; # no more occurences push @occurences, $idx; $offset = $idx; } return calc_avg_distance( \@occurences, length $goal ); } sub calc_avg_distance { my ( $occurences, $len ) = @_; return unless $occurences and scalar @$occurences; my $start = shift @$occurences; my @distances; while ( my $end = shift @$occurences ) { push @distances, ( $end - $start ) - $len; $start = $end; } my $sum = reduce { $a + $b }, @distances; my $n = scalar @distances; return $sum / $n; }

    As Occam said: Entia non sunt multiplicanda praeter necessitatem.

Re: stupid/simple mistake
by rnaeye (Pilgrim) on Oct 18, 2011 at 17:02 UTC

    This works for me:

    my $substring = 'GATC'; my$i=0; my$count; my $sequence; while ($sequence=<DATA>){ foreach($sequence =~ /($substring)/g) { print "$1\n"; $count++; } } print "There are $count negative numbers in the string\n"; __DATA__ GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC
Re: stupid/simple mistake
by Anonymous Monk on Oct 18, 2011 at 20:08 UTC
    #!/usr/bin/perl use warnings; use strict; my $substring = 'GATC'; while (<DATA>) { my (undef, @parts) = split /(?=$substring)/, $_, -1; pop @parts; my $distance = 0; $distance += length for @parts; printf "Average distance: %4.2f\n", $distance / @parts; } __END__ GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC
Re: stupid/simple mistake
by jwkrahn (Monsignor) on Oct 19, 2011 at 00:31 UTC
    is there any easy way to count the letters between each time it finds the substring

    Perhaps this will give you some ideas on how to do that:

    $ perl -le' $_ = "GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC"; my $substring = "GATC"; print "length of string is ", length; my $start = 0; while ( /$substring/g ) { + print qq/"$substring" found at $-[0] to $+[0], distance from last +is /, $-[0] - $start; $start = $+[0]; } print "distance from end is ", length() - $start; ' length of string is 42 "GATC" found at 10 to 14, distance from last is 10 "GATC" found at 23 to 27, distance from last is 9 "GATC" found at 35 to 39, distance from last is 8 distance from end is 3

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://932126]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (13)
As of 2014-12-22 16:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (121 votes), past polls