in reply to
extracting open reading frame (ORF) from a FASTA file
Without giving you the explicit answers, here's some things to think about that you may or may not already know:
- Don't forget to reverse-complement the DNA sequence and search for reading frames. Look at the reverse and tr functions for some help with that.
- An ORF is only valid if the start and stop codons are "in-frame," which means that the length of the ORF is divisible by three with no remainder.
- Most definitions of a valid ORF will not allow any in-frame stop codons. So, for example, the sequence start-start-stop is a valid ORF, but the sequence with start-stop-stop is not. There's exceptions to this (there are always exceptions in biology), but I doubt they are relevant to what you're doing.
Good luck. :)