__DATA__
>i:/13414 Ccl9 (88)/sequencing/13414_fasta.Contig1
TTTCTCCGGCCCCCTCCTCCCGCGGGGGAAAAAACCCGGGGAGCAGTCGG
GCAGGGGTTTTTTGGTTTTTTCAAAATAAAAAGGGGTGCCCGTTGGGGGGcd
GGGGGGGTGCAGGTTTCAACCCCCCCCCCCAAAGAAAAAAAAATTTTGGG
GAATTTTTGGGGGGCTCCACCAGTTTTCGGGGTTTTTGGGCCTTTTCAGA
AGGTAGGTTGGACGCGGATTGGGCAATAAACCACCCCGCTTCATCGGATA
ATTTTCCCCGGCCGAAAAGGGCCGCGGGGCCGGTGGGCGGCCTTGGGTTT
>i:/13414 Ccl9 (88)/sequencing/13414_fasta.Contig2
TAAACCCAAGGCCCCCCAGGTAAAAAAAAAACCGGCCAGGGGGGGGGGGG
TAAAAAAAACCAAGTGTCACCCAGGGTGGAGATCCCCGGAAAAGGAAAAG
GGGGGTTTTTTATTCGAAACGGGGAAAACTTTCACAAAATTTTGGAAGAA
TCCCCTTTAATGTTTTCTTTTCAAAAGGGGGTAAAAAAACCACCTTTAAA
AAGAAGTCTACCTTGGGAAAAAATAATTTTTGGGAAAATTTAAAAATTGA
based on the delimiter '>' and then print the lines in between '>' and the next '>' then continue until I have separate files like this file1:>i:/13414 Ccl9 (88)/sequencing/13414_fasta.Contig1
TTTCTCCGGCCCCCTCCTCCCGCGGGGGAAAAAACCCGGGGAGCAGTCGG
GCAGGGGTTTTTTGGTTTTTTCAAAATAAAAAGGGGTGCCCGTTGGGGGGcd
GGGGGGGTGCAGGTTTCAACCCCCCCCCCCAAAGAAAAAAAAATTTTGGG
GAATTTTTGGGGGGCTCCACCAGTTTTCGGGGTTTTTGGGCCTTTTCAGA
AGGTAGGTTGGACGCGGATTGGGCAATAAACCACCCCGCTTCATCGGATA
ATTTTCCCCGGCCGAAAAGGGCCGCGGGGCCGGTGGGCGGCCTTGGGTTT
And file2:
>i:/13414 Ccl9 (88)/sequencing/13414_fasta.Contig2
TAAACCCAAGGCCCCCCAGGTAAAAAAAAAACCGGCCAGGGGGGGGGGGG
TAAAAAAAACCAAGTGTCACCCAGGGTGGAGATCCCCGGAAAAGGAAAAG
GGGGGTTTTTTATTCGAAACGGGGAAAACTTTCACAAAATTTTGGAAGAA
TCCCCTTTAATGTTTTCTTTTCAAAAGGGGGTAAAAAAACCACCTTTAAA
AAGAAGTCTACCTTGGGAAAAAATAATTTTTGGGAAAATTTAAAAATTGA
I am running into some trouble with this code:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
#
# split a fasta file into separate sequence files
#
open( my $seqs, "C:/Documents and Settings/mydir/13063_fasta.contigs")
+;
open(my $seq_out,">C:/Documents and Settings/mydir/contig.fa" );
$/ = '\777'; # entire input to be read in one slurp
$seqs = <>; # read input, assigning to single string
while (<$seqs>){
if($seqs =~ m/^(>[^>]+)/mg) { # match indiv. sequences by '>'s
push(my @seqs,$1); # and store in array
}
for (my @seqs) {
# only allow characters A-Z,a-z,0-9,'_','-', and '.' in names;
# change if you're more liberal
/^> *([\w\-\.]+)/ && (my $seq_name = $1);
if ($seq_name) {
open($seq_out,">$seq_name");
print $seq_out "$_";
}
else {
warn "couldn't recognise the sequence name in \n$_";
}
}
}
close($seqs);
close($seq_out);
Sorry for not being specific enough :-) |