I wish to use the substr() function to search for the particular motif. #
substr is *not* designed for (nor capable of) searching for anything; so why are you specifying that particular function?
You've defined your IUPAC codes in terms of regex character classes; so why are you eschewing the regex engine?
Given your table, it is trivial to convert IUPAC codes into a regex and use the regex engine to search your fasta file:
my %IUPAC = (
A => '[A]',
C => '[C]',
G => '[G]',
T => '[T]',
R => '[AG]',
Y => '[CT]',
M => '[AC]',
K => '[GT]',
W => '[AT]',
S => '[GC]',
B => '[CGT]',
D => '[AGT]',
H => '[ACT]',
V => '[ACG]',
N => '[ACGT]',
);
my( $file, $motif ) = @ARGV;
my $re = join '', map $IUPAC{ $_ }, split '', $motif;
open FASTA, '<', $file or die $!;
getc( FASTA ); ## discard first '>'
until( eof( FASTA ) ) {
chomp( my $id = <FASTA> ); ## read ident
my $seq = do{ local $/ = '>'; <FASTA> };
$seq =~ tr[\n>][]d;
while( $seq =~ m[($re)]g ) {
printf "Found: '$1' at '$id':%d\n", $-[0];
}
}
NB: The above is untested code typed directly into my browser.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|