Instead of matching valid sequences, match invalid characters. Then use $-[0] to find the position of that match. (The @- array is documented in the "perlvar" manual page.)
use strict;
use warnings;
while (my $sequence = <DATA>) {
chomp $sequence;
if ($sequence =~ /[^ATCG]/){
warn "Sequence '$sequence' has invalid character after " . $-[
+0];
}
else {
print "Valid sequence: '$sequence'\n";
}
}
__DATA__
TAAGAACAATAAGAACAA
TAAGAACAATAAUAACAA
TAAGAACAATAAGAACAA
You don't need to split the sequence up into individual characters and process each one separately. That's slow.