|Welcome to the Monastery|
Re: Problem computing GC contentby Kenosis (Priest)
|on Jul 14, 2014 at 17:03 UTC||Need Help??|
Here's another option:
Command-line usage: perl script.pl fastaFile [>outfile]
The last, optional parameter directs output to a file.
Sample FASTA record:
Output on that sample FASTA record:
Since FASTA files use ">" as the record separator, the script sets Perl's record separator to that character, so the file's read one FASTA record at a time. After a read, chomp removes that record separator.
Next, /\s/ tests to see that there are characters to parse and, if not, the next record is read. Using a regex, the id and seq are captured. A substitution is used to remove any newlines, so an accurate length count can be made. (The sequence in FASTA records may be across multiple lines; this script will handle these.)
Hope this helps!