Hi Perl Monk,
I had tried to add a do-untilloop just after counting bases to the program given below to find the number of motifs and to find lengths between motifs with a <STDIN>input for motif (with 2- 100 letters).
But I didn’t get correct results for number of motifs and lengths between any two motifs in either cmd or output text file. I initialized several variables but in vain. I am looking forward to perl monks to help me correct the program in counting motifs(regular expressions) and lengths between motifs in a large string of about 299MB. I want get correct results in cmd as well as in text page (length values in vertical positions in text page).
#!usr/bin/perl
use strict;
use warnings;
if (! @ARGV) {
print <<HELP;
Usage:
> basecount.pl <bases file>
HELP
exit;
} open my $dnaIn, '<', $ARGV[0] or die "Can't open bases file $ARGV[0]
+: +$!\n";
my %counts;
my @baseList = qw(A T G C);
while (defined (my $line = <$dnaIn>)) {
chomp $line;
++$counts{$_} for grep {/\S/} split '', $line;
}
my $bases;
my $errors;
$bases += $_ for @counts{@baseList};
$errors += $_ for map {$counts{$_}} grep {! /[ATGC]/} keys %counts;
print "\n\n Total bases: $bases\n\n";
print join (', ', map {"$_= $counts{$_}"} @baseList), "\n";
print "Errors (N)= $errors\n" if $errors;
# In a loop, ask the user for a motif, search for the motif, and repor
+t if it # was found. Exit if no motif is entered.
my $DNA=join('',@ARGV); my $motif=''; do {
print "\n\nEnter a motif to count its number and lengths between motif
+s:\n";# $motif = <STDIN>;
chomp $motif;
# Look for the motif
if ( $DNA=~ / $motif/ ) {
print "I found the motif!\n\n";
} else {
print"I couldn\'t find it.\n\n";
}
# Count number of motifs and Count number of nt between two motifs
use 5.010;
my $string ="@ARGV";
# Remove whitespace
$string=~ s/\s//g;
my $count= () =$string=~ /$motif/g;
print "Number of motifs: $count.\n\n";
say "The inter-motif nt Lengths are:\n";
say length for split/$motif/,$string;
my @a=map length,split/$motif/,$string;
# Output to a text page
my $output="result .txt";
unless (open(RESULT,"> $output")){
print"Cannot open file\"$output\".\n\n";
exit;
}
print RESULT"\n\n Number of bases: $bases. Errors(N)=$errors.\n
Motif: $motif. Number of motifs: $count.\n\n The inter-motif nt Leng
+ths are:\n\n @a";
close(RESULT);
}
until (my $motif =~ /^\s*$/ );
exit;
My input file t.txt has the sequence:
ATGCCCGATATATATCCCNNNATATATGCGCATGCTGCT
Cmd output is as follows:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Documents and Settings\user>cd d*
C:\Documents and Settings\user\Desktop>basecount.pl t.txt
"my" variable $motif masks earlier declaration in same scope at C:\Doc
+uments and
Settings\user\Desktop\basecount.pl line 58.
Total bases: 36
A= 9, T= 11, G= 6, C= 10
Errors (N)= 3
Enter a motif to count its number and lengths between motifs:
AT
I couldn't find it.
Number of motifs: 0.
The inter-motif nt Lengths are:
5
Use of uninitialized value $motif in pattern match (m//) at C:\Documen
+ts and Set
tings\user\Desktop\basecount.pl line 27, <STDIN> line 1.
C:\Documents and Settings\user\Desktop>
Text Output is as follows:
Number of bases: 36. Errors(N)=3.
Motif: AT. Number of motifs: 0.
The inter-motif nt Lengths are:
5
Correct results in result.txt file should look something like this:
Number of bases: 39. A=9; T=11; G=6; C=10; Errors(N)=3.
I found the motif.
Motif: AT. Number of motifs: 9.
The inter-motif nt Lengths are:
0
5
0
0
0
6
0
0
4
6