Re: stupid/simple mistake
by toolic (Bishop) on Oct 18, 2011 at 12:55 UTC
|
To count the number of occurrences of GATC in your string (this is a FAQ):
perldoc -q count
use warnings;
use strict;
my $substring = 'GATC';
my $count = 0;
my $sequence = 'GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC';
while ($sequence =~ /$substring/g) { $count++ }
print "count = $count\n";
__END__
count = 3
See also: How do I compose an effective node title? | [reply] [d/l] [select] |
|
while (<FILE>) {
$count += () = /$substring/g;
}
ought to do it. | [reply] [d/l] [select] |
|
#!/usr/bin/perl
use warnings;
use strict;
open (FILE,"sequence.txt");
my $substring = 'GATC';
my$i=0;
my$count;
my $sequence;
my$result;
my $number=0;
my $string;
my $offset = 0;
my @results;
while ($sequence=<FILE>){
foreach($sequence =~ /$substring/g) {
#print "malakas\n";
$count += () = /$substring/g;
$number++;
my $result = index($sequence, $substring, $offset);
while ($result != -1) {
push (@results, $result);
print "Found $substring at $result\n";
$offset = $result + 1;
$result = index($sequence, $substring, $offset);
}
#print "$count\n";
}
#print "$result\n";
}
foreach $result(@results){
my $sum
}
print "There are $count GATCs";
ok now I have the positions into an array, but I need to calculate the distances between them, which is the sum of result(1)-result(0) + result(2)-result(1) +.... etc etc.
I don't really know how to write that in code. any ideas?
Thanks in advance again !
| [reply] [d/l] |
|
|
Re: stupid/simple mistake
by choroba (Cardinal) on Oct 18, 2011 at 12:50 UTC
|
You need the /g switch to your regex match: for ($sequence =~ /$substring/g) {
and, of course, increment $count in the inner loop. | [reply] [d/l] [select] |
Re: stupid/simple mistake
by Ratazong (Monsignor) on Oct 18, 2011 at 12:51 UTC
|
The $count++ is outside the foreach-loop ... so you are counting the lines of the file instead ... which seems to be 1. | [reply] [d/l] |
Re: stupid/simple mistake
by Caio (Acolyte) on Oct 18, 2011 at 12:50 UTC
|
As i stand, i see the following, your code counts up to 1 match per line, so(I guess) that's why you are getting 1 instead of 3.
As for counting the bases between the matches... I'd save the match positions, and the subtrac them to get the actual distances ;)
Also, this node(Re: Exact string matching) might help to solve the 1 match per line problem, but it'll take longer =/ | [reply] |
|
thanks everyone for the replies. I am really not very experienced in perl, since I do not use it so much.
As for counting the bases between the matches... I'd save the match positions, and the subtrac them to get the actual distances ;)
I can use index to do that, but then I need to add +3 in the first one, and then I want to store them in an array, so in the end I can calculate the avg distance, by dividing the sum of the distances, by the count. any suggestions, how to do that?
Thanks everyone
| [reply] |
|
| [reply] |
Re: stupid/simple mistake
by jwkrahn (Abbot) on Oct 19, 2011 at 00:31 UTC
|
$ perl -le'
$_ = "GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC";
my $substring = "GATC";
print "length of string is ", length;
my $start = 0;
while ( /$substring/g ) {
+
print qq/"$substring" found at $-[0] to $+[0], distance from last
+is /, $-[0] - $start;
$start = $+[0];
}
print "distance from end is ", length() - $start;
'
length of string is 42
"GATC" found at 10 to 14, distance from last is 10
"GATC" found at 23 to 27, distance from last is 9
"GATC" found at 35 to 39, distance from last is 8
distance from end is 3
| [reply] [d/l] |
Re: stupid/simple mistake
by TomDLux (Vicar) on Oct 18, 2011 at 15:55 UTC
|
#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use List::Util;
my $file = 'sequence.txt';
my $goal = 'GATC';
my $stats = process_file( $file, $goal );
# do something with stats ...
# ----------------------------------------
# Subroutines
#
sub process_file {
my ( $file, $goal ) = @_;
open my $infh, '<', $file;
my $stats = process_lines( $infh, $goal );
close $infh;
return $stats
}
sub process_lines {
my ( $infh, $goal ) = @_;
my @stats;
while ( my $line - <$infh> ) {
chomp $line;
my $linestats = process_one_line( $line, $goal );
push @stats, $linestats || 0;
}
return \@stats;
}
sub process_one_line {
my ( $line, $goal ) = @_
my @occurences;
my ( $offset ) = ( 0 );
SEEK:
while ( 1 ) {
$idx = index( $line, $goal, $offset );
last SEEK if $idx == -1; # no more occurences
push @occurences, $idx;
$offset = $idx;
}
return calc_avg_distance( \@occurences, length $goal );
}
sub calc_avg_distance {
my ( $occurences, $len ) = @_;
return unless $occurences and scalar @$occurences;
my $start = shift @$occurences;
my @distances;
while ( my $end = shift @$occurences ) {
push @distances, ( $end - $start ) - $len;
$start = $end;
}
my $sum = reduce { $a + $b }, @distances;
my $n = scalar @distances;
return $sum / $n;
}
As Occam said: Entia non sunt multiplicanda praeter necessitatem.
| [reply] [d/l] |
Re: stupid/simple mistake
by rnaeye (Friar) on Oct 18, 2011 at 17:02 UTC
|
my $substring = 'GATC';
my$i=0;
my$count;
my $sequence;
while ($sequence=<DATA>){
foreach($sequence =~ /($substring)/g) {
print "$1\n";
$count++;
}
}
print "There are $count negative numbers in the string\n";
__DATA__
GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC
| [reply] [d/l] |
Re: stupid/simple mistake
by Anonymous Monk on Oct 18, 2011 at 20:08 UTC
|
#!/usr/bin/perl
use warnings;
use strict;
my $substring = 'GATC';
while (<DATA>) {
my (undef, @parts) = split /(?=$substring)/, $_, -1;
pop @parts;
my $distance = 0;
$distance += length for @parts;
printf "Average distance: %4.2f\n", $distance / @parts;
}
__END__
GAGAGACCCCGATCGAGAGACCCGATCFGAGAVCTGATCCCC
| [reply] [d/l] |