stupid/simple mistake

gogoglou has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: stupid/simple mistake by toolic (Bishop) on Oct 18, 2011 at 12:55 UTC
To count the number of occurrences of GATC in your string (this is a FAQ): `perldoc -q count` [download] `use warnings; use strict; my $substring = 'GATC'; my $count = 0; my $sequence = 'GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC'; while ($sequence =~ /$substring/g) { $count++ } print "count = $count\n"; __END__ count = 3` [download] See also: How do I compose an effective node title?	[reply] [d/l] [select]
Re^2: stupid/simple mistake by JavaFan (Canon) on Oct 18, 2011 at 13:05 UTC
`while ($sequence =~ /$substring/g) { $count++ }` [download] You never learned how to add numbers without using your fingers? ;-) I don't understand the urge to count one-by-one -- and you're not the first poster in this thread to do so. `while (<FILE>) { $count += () = /$substring/g; }` [download] ought to do it.	[reply] [d/l] [select]
Re^3: stupid/simple mistake by gogoglou (Beadle) on Oct 18, 2011 at 14:39 UTC
#!/usr/bin/perl use warnings; use strict; open (FILE,"sequence.txt"); my $substring = 'GATC'; my$i=0; my$count; my $sequence; my$result; my $number=0; my $string; my $offset = 0; my @results; while ($sequence=<FILE>){ foreach($sequence =~ /$substring/g) { #print "malakas\n"; $count += () = /$substring/g; $number++; my $result = index($sequence, $substring, $offset); while ($result != -1) { push (@results, $result); print "Found $substring at $result\n"; $offset = $result + 1; $result = index($sequence, $substring, $offset); } #print "$count\n"; } #print "$result\n"; } foreach $result(@results){ my $sum } print "There are $count GATCs"; [download] ok now I have the positions into an array, but I need to calculate the distances between them, which is the sum of result(1)-result(0) + result(2)-result(1) +.... etc etc. I don't really know how to write that in code. any ideas? Thanks in advance again !	[reply] [d/l]
Re^4: stupid/simple mistake by Caio (Acolyte) on Oct 18, 2011 at 16:05 UTC
Re^4: stupid/simple mistake by JavaFan (Canon) on Oct 18, 2011 at 22:04 UTC
Re: stupid/simple mistake by choroba (Cardinal) on Oct 18, 2011 at 12:50 UTC
You need the /g switch to your regex match: `for ($sequence =~ /$substring/g) {` [download] and, of course, increment `$count` in the inner loop.	[reply] [d/l] [select]
Re: stupid/simple mistake by Ratazong (Monsignor) on Oct 18, 2011 at 12:51 UTC
The `$count++` is outside the foreach-loop ... so you are counting the lines of the file instead ... which seems to be 1.	[reply] [d/l]
Re: stupid/simple mistake by Caio (Acolyte) on Oct 18, 2011 at 12:50 UTC
As i stand, i see the following, your code counts up to 1 match per line, so(I guess) that's why you are getting 1 instead of 3. As for counting the bases between the matches... I'd save the match positions, and the subtrac them to get the actual distances ;) Also, this node(Re: Exact string matching) might help to solve the 1 match per line problem, but it'll take longer =/	[reply]
Re^2: stupid/simple mistake by gogoglou (Beadle) on Oct 18, 2011 at 13:26 UTC
thanks everyone for the replies. I am really not very experienced in perl, since I do not use it so much. As for counting the bases between the matches... I'd save the match positions, and the subtrac them to get the actual distances ;) I can use index to do that, but then I need to add +3 in the first one, and then I want to store them in an array, so in the end I can calculate the avg distance, by dividing the sum of the distances, by the count. any suggestions, how to do that? Thanks everyone	[reply]
Re^3: stupid/simple mistake by Caio (Acolyte) on Oct 18, 2011 at 13:41 UTC
well, for these i've got nothing on my own, but i offer these: Statistics::Descriptive and How do I retrieve the position of the first occurrence of a match?	[reply]
Re: stupid/simple mistake by jwkrahn (Abbot) on Oct 19, 2011 at 00:31 UTC
is there any easy way to count the letters between each time it finds the substring Perhaps this will give you some ideas on how to do that: $ perl -le' $_ = "GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC"; my $substring = "GATC"; print "length of string is ", length; my $start = 0; while ( /$substring/g ) { + print qq/"$substring" found at $-[0] to $+[0], distance from last +is /, $-[0] - $start; $start = $+[0]; } print "distance from end is ", length() - $start; ' length of string is 42 "GATC" found at 10 to 14, distance from last is 10 "GATC" found at 23 to 27, distance from last is 9 "GATC" found at 35 to 39, distance from last is 8 distance from end is 3 [download]	[reply] [d/l]
Re: stupid/simple mistake by TomDLux (Vicar) on Oct 18, 2011 at 15:55 UTC
#!/usr/bin/perl use warnings; use strict; use autodie; use List::Util; my $file = 'sequence.txt'; my $goal = 'GATC'; my $stats = process_file( $file, $goal ); # do something with stats ... # ---------------------------------------- # Subroutines # sub process_file { my ( $file, $goal ) = @_; open my $infh, '<', $file; my $stats = process_lines( $infh, $goal ); close $infh; return $stats } sub process_lines { my ( $infh, $goal ) = @_; my @stats; while ( my $line - <$infh> ) { chomp $line; my $linestats = process_one_line( $line, $goal ); push @stats, $linestats \|\| 0; } return \@stats; } sub process_one_line { my ( $line, $goal ) = @_ my @occurences; my ( $offset ) = ( 0 ); SEEK: while ( 1 ) { $idx = index( $line, $goal, $offset ); last SEEK if $idx == -1; # no more occurences push @occurences, $idx; $offset = $idx; } return calc_avg_distance( \@occurences, length $goal ); } sub calc_avg_distance { my ( $occurences, $len ) = @_; return unless $occurences and scalar @$occurences; my $start = shift @$occurences; my @distances; while ( my $end = shift @$occurences ) { push @distances, ( $end - $start ) - $len; $start = $end; } my $sum = reduce { $a + $b }, @distances; my $n = scalar @distances; return $sum / $n; } [download] As Occam said: Entia non sunt multiplicanda praeter necessitatem.	[reply] [d/l]
Re: stupid/simple mistake by rnaeye (Friar) on Oct 18, 2011 at 17:02 UTC
This works for me: `my $substring = 'GATC'; my$i=0; my$count; my $sequence; while ($sequence=<DATA>){ foreach($sequence =~ /($substring)/g) { print "$1\n"; $count++; } } print "There are $count negative numbers in the string\n"; __DATA__ GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC` [download]	[reply] [d/l]
Re: stupid/simple mistake by Anonymous Monk on Oct 18, 2011 at 20:08 UTC
`#!/usr/bin/perl use warnings; use strict; my $substring = 'GATC'; while (<DATA>) { my (undef, @parts) = split /(?=$substring)/, $_, -1; pop @parts; my $distance = 0; $distance += length for @parts; printf "Average distance: %4.2f\n", $distance / @parts; } __END__ GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC` [download]	[reply] [d/l]


more useful options
	PerlMonks