printing fasta sequences from Hash

Bijgom has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have a problem and I would greatly appreciate your wisdom to solve it. So, I am reading from a fasta file (containing protein seqeunces) while looping over and trying to print protein sequences. My problem is: When trying to print each protein sequence that I have in my hash, the sequences are broken down as follows:

Protein sequence:STARTXXXXX
Protein Sequence: XXXXXXXXX
Protein sequence: XXXXXXEND
[download]

Question: How could I avoid this? Here is my script:

use warnings;

my %database;
my $protein_id = "";
my $line = "";
my $datbase_filename="";
my $start=">DDB";
my $end="";
#input1: open the file containg the protein sequences.
print"Please enter the path to the fasta file of protein sequences \n"
+;

#put the file name into $filenam
$database_filename=<STDIN>;

#remove newline
chomp($database_filename);

#open the file containing protein sequences in fasta format
open(MYFILE1,"<$database_filename") or die "Unable to open $database_f
+ilename: $!\n";

while(<MYFILE1>) 

{    #remove newline
    chomp; 
    $line=$_;
    #print "Line i: $_\n"; #test
    
        if (($line=~m/^>DDB/) and
            ($line=~ m/to\s\d+$/))
            {
                $protein_id = $line;
                #print "protein ID: $protein_id\n";#test
            }
            
        else
            {
                $proteins{$protein_id} = $line;# associating the prote
+in sequence in my %proteins to its  ID
                
                print "sequence: $line\n";#test
                
            }
        
}
exit;
[download]

Many thanks in advance.

Comment on printing fasta sequences from Hash Select or Download Code

Replies are listed 'Best First'.
Re: printing fasta sequences from Hash by biohisham (Priest) on Feb 25, 2015 at 03:14 UTC
Your problem is that each fasta sequence spans multiple lines. Knowing that the fasta header is `">"` you can use that to your advantage by telling your program to split around the fasta separator and slurp all that is in between. Another neater solution is achievable through Bio::SeqIO which requires BioPerl installation on your system. So for the first case, slurping each fasta record the following code gets you there with few modifications, improtantly, note the default separator variable `$/`was locally modified to ">" `use strict; use warnings; use Data::Dumper; local $/=">"; while(my $fasta_record=<DATA>){ chomp $fasta_record; $fasta_record =~ s/(^>*.+\n)/$1: /; #add a space after the fasta +header $fasta_record =~ s/\n//g; # remove endlines print $fasta_record,"\n"; } __DATA__ >protein1 WWWWTCTG TTWTTTCT TTWWWC >protein2 WWWWTCTG TTWWWC TTWTTTCT >protein3 TTWWWC WWWWTCTG TTWTTTCT` [download] I prefer using a BioPerl module to do this kind of tasks since I can get more time to focus on the more important stuff of doing something or the other with the sequences rather than grapple with how to read them in Assuming that your sequences are in a fasta file called database_filename.fa `use strict; use warnings; use Bio::SeqIO; #initialize a bioperl object instance my $in = Bio::SeqIO->new(-file=> "database_filename.fa", -format=>"fas +ta"); while(my $seq = $in->next_seq){ #read the sequences in the bioperl +object print $seq->id," ",$seq->seq; #output,ID space-separated from sequ +ences print "\n"; }` [download] A 4 year old monk	[reply] [d/l] [select]
Re^2: printing fasta sequences from Hash by Bijgom (Initiate) on Feb 25, 2015 at 09:07 UTC
Many thanks bioisham for kindly replying to my post. I will indeed install BioPerl. Hope it will make things smother. Regards, B	[reply]
Re: printing fasta sequences from Hash by Laurent_R (Canon) on Feb 24, 2015 at 18:46 UTC
Please modify your post to put your code and data samples between opening `<code>` and closing `</code>` tags. This will make your post readable to human eye. Je suis Charlie.	[reply] [d/l] [select]
Re^2: printing fasta sequences from Hash by AnomalousMonk (Archbishop) on Feb 24, 2015 at 21:28 UTC
Bijgom: Please see How do I post a question effectively?, Writeup Formatting Tips, Markup in the Monastery, What shortcuts can I use for linking to other information? and Perl Monks Approved HTML tags. Give a man a fish: `<%-(-(-(-<`	[reply] [d/l]
Re: printing fasta sequences from Hash by kroach (Pilgrim) on Feb 24, 2015 at 21:04 UTC
A more readable version of the question: Hello Monks, I have a problem and I would greatly appreciate your wisdom to solve it. So, I am reading from a fasta file (containing protein seqeunces) while looping over and trying to print protein sequences. My problem is: When trying to print each protein sequence that I have in my hash, the sequences are broken down as follows: `Protein sequence:STARTXXXXX Protein Sequence: XXXXXXXXX Protein sequence: XXXXXXEND` [download] Question: How could I avoid this? Here is my script: use warnings; my %database; my $protein_id = ""; my $line = ""; my $datbase_filename = ""; my $start = ">DDB"; my $end = ""; #input1: open the file containg the protein sequences. print "Please enter the path to the fasta file of protein sequences\n" +; #put the file name into $filenam $database_filename = <STDIN>; chomp($database_filename); #remove newline #open the file containing protein sequences in fasta format open(MYFILE1, "<$database_filename") or die "Unable to open $database_filename: $!\n"; while (<MYFILE1>) { chomp; #remove newline $line = $_; print "Line i: $_\n"; #test if (($line =~ m/^>DDB/) and ($line =~ m/to\s\d+$/)) { $protein_id = $line; print "protein ID: $protein_id\n"; #test } else { # associating the protein sequence in my %proteins to its ID $proteins{$protein_id} = $line; print "sequence: $line\n"; #test } } exit; [download] Many thanks in advance.	[reply] [d/l] [select]
Re^2: printing fasta sequences from Hash by AnomalousMonk (Archbishop) on Feb 24, 2015 at 21:25 UTC
kroach: I appreciate you're trying to be helpful, but I think it would have been better if Bijgom had re-formatted the OP him- or herself. Give a man a fish: `<%-(-(-(-<`	[reply] [d/l]
Re^2: printing fasta sequences from Hash by Bijgom (Initiate) on Feb 24, 2015 at 23:03 UTC
Sorry guys for my mistake...And many thanks Kroach for editing my post...	[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks