Using
Bio::SeqIO from
BioPerl provides a powerful way of dealing with I/O operations on biological data files having the FastA format or any other format for that matter. The input file object is created as well as three output file objects, each one of these objects has information about the format to read from or write into and the file name to read from or direct the output to, in case the output file doesn't exist, it is created for you too.
Using an appropriate BioPerl interface will eliminate the need to construct regexes to detect sequence identifiers and strings, it will also allow you to flexibly migrate among different biological data formats on the go. That will add up to saving time focusing on the data manipulation tasks rather than coding techniques implementation...
#!/usr/local/bin/perl
#title "Compare hash with arrays and print"
use strict;
use warnings;
use Bio::SeqIO;
my %hash = (
aw1=>10,
qs2=>20,
dd3=>30,
de4=>10,
hg5=>30,
dfd6=>20,
gf4=>20,
hgh5=>30,
hgy3=>10,
);
my $file = "Sample.fa";
my $file10 = "10.fa";
my $file20 = "20.fa";
my $file30 = "30.fa";
my $seq = Bio::SeqIO->new(-file => "<$file", -format=>'fasta'); # inpu
+t object
#output objects
my $seqOut10 = Bio::SeqIO->new(-file => ">$file10", -format=>'fasta');
my $seqOut20 = Bio::SeqIO->new(-file => ">$file20", -format=>'fasta');
my $seqOut30 = Bio::SeqIO->new(-file => ">$file30", -format=>'fasta');
while(my $seqIn = $seq->next_seq()){
for my $key (keys %hash){
if($seqIn->id eq $key && $hash{$key}==10){
$seqOut10->write_seq($seqIn);
}elsif($seqIn->id eq $key && $hash{$key} == 20
+){
$seqOut20->write_seq($seqIn);
}elsif($seqIn->id eq $key && $hash{$ke
+y} == 30){
$seqOut30->write_seq($seqIn);
}
}
}
Excellence is an Endeavor of Persistence.
A Year-Old Monk :D .