Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Reading in Qual File Using Bio::SeqIO

by twaddlac (Novice)
on Jul 01, 2010 at 19:22 UTC ( #847595=perlquestion: print w/ replies, xml ) Need Help??
twaddlac has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!!

I am trying to read in a qual file and put it directly into a hash so that I can access the quality string via id. However, for some odd reason, I cannot get it to work. I am reading in fasta files and assigning them to hashes with no problem, but I cannot seem to get the quality file to work. Any suggestions? Below is my code and what my qual data looks like


$qual_obj = Bio::SeqIO->new(-file => "/home/Alan/Desktop/sequence_data +/$qual_file", -format => "qual"); my %qual_hash = (); while($qual_hash = $qual_obj->_next_qual){ $qual_hash{$qual_obj->id} = $qual_obj->seq; }

>FQ4HLCS01B2SNT rank=0025082 x=734.0 y=3335.0 length=53 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 40 40 40 40 40 40 40 40 4 +0 40 40 40 40 40 40 40 40 40 40 40 38 38 35 33 32 32 27 27 15 13 13 1 +3 11 14 11 17 17 17 >FQ4HLCS01BGPRG rank=0042480 x=483.0 y=1242.5 length=108 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 40 40 40 40 40 40 40 40 4 +0 40 40 40 40 40 40 40 40 40 40 40 40 39 39 39 39 39 39 39 39 39 39 3 +9 39 39 39 40 40 37 37 37 37 37 37 35 33 32 27 23 19 17 17 19 19 11 11 11 11 14 22 14 21 25 24 25 25 29 30 30 2 +8 28 28 28 28 28 16 16 11 11 11 11 11 17 16 12 12 17 16 18 17 22 16 1 +2 13 >FQ4HLCS01AY14Q rank=0050917 x=282.0 y=760.0 length=314 29 29 18 14 14 14 14 14 27 30 28 17 19 19 19 31 35 37 30 31 31 37 39 4 +0 40 40 40 40 40 40 40 40 40 40 40 39 39 39 40 40 40 40 40 40 40 40 3 +9 38 37 37 37 37 37 25 25 21 21 21 21 30 30 35 37 37 35 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 28 28 28 2 +8 37 37 37 37 30 30 23 23 23 32 37 37 37 37 37 37 37 37 37 37 37 37 3 +7 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 35 35 35 35 33 32 28 18 18 21 21 25 25 28 3 +3 33 33 32 32 32 33 33 33 30 25 25 21 21 21 25 25 32 32 35 37 37 37 3 +7 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 3 +7 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 3 +7 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 37 35 35 33 33 33 16 16 1 +7 17 17 27 11 19 19 22 21 21 23 23 23 23 23 24 23 19 23 27 27 19 19 1 +9 23 23 28 27 27 27 27 17 17 19 24 23 21 21 11 11 11 22 22 24 16 12 11 11 0 12 12

Let me know if you need any more info!! Thank you very much!!

Comment on Reading in Qual File Using Bio::SeqIO
Select or Download Code
Re: Reading in Qual File Using Bio::SeqIO
by chuckbutler (Vicar) on Jul 01, 2010 at 20:15 UTC

    May I ask, what is the value of $qual_file?

    Good luck. -c

      Ah, forgive me. That is just the file name of the actual data, stored in a text editor. However, I did find something interesting about the qual file that is being read in. When not specifying the format of the file being streamed in, the parser identified it as fasta and not qual. This file type seems to be qual, but I'm not certain. Hopefully that helps!


      Thank you!
Re: Reading in Qual File Using Bio::SeqIO
by biohisham (Priest) on Jul 01, 2010 at 22:13 UTC
    This qual format is similar to FastA but it has its own module interfaces at 'Bio::Seq::PrimaryQual' and Bio::SeqIO::qual, you are mixing between Bio::SeqIO and the others since Bio::SeqIO is the general wrapper for many other packages that intelligently handle many different formats each having their own peculiarities. It is still flexibly possible to interchange methods under such a generalized wrapper btw with some significant differences (read 'potential for things to go wrong!')...

    However BioPerl can gift you valuable hints were you adhering to and observing the mantra of:

    use strict; use warnings;
    at the top of your program and declaring your variable accordingly, this can save you needless tail-chasing.

    $qual_obj->seq is better written as $qual_obj->qual
    use strict; use warnings; use Bio::SeqIO::qual; my $file = "Qual.txt"; my $qual_obj = Bio::SeqIO::qual->new(-file =>"<$file",-format=>'qual', +); my %qual_hash ; while(my $qual = $qual_obj->_next_qual){ $qual_hash{$qual->id}= $qual->qual(); } use Data::Dumper; print Dumper(\%qual_hash);
    The above code's $qual_obj can still be initialized directly through Bio::SeqIO:
    use strict; use warnings; use Bio::SeqIO; my $file = "Qual.txt"; my $qual_obj = Bio::SeqIO->new(-file =>"<$file",-format=>'qual',);

    You can also read this file in as a FastA file since it has the '>' record identifier, however, each qual record would not be indexed in an array as is the case in the above example since reading the file as a FastA would slurp in all qual values into a single string that is treated as a protein sequence by BioPerl.

    #THIS DOESN'T READ THE FILE RIGHT.. my $file = "Qual.txt"; my $qual_obj = Bio::SeqIO->new(-file =>"<$file",-format=>'fasta',); my %seq_hash = (); while(my $seq= $qual_obj->next_seq){ $seq_hash{$seq->id} = $seq->seq; } use Data::Dumper; print Dumper(\%seq_hash);
    Kudos for providing examples of code and input :)...



    Excellence is an Endeavor of Persistence. A Year-Old Monk :D .
      Thank you very much @biohisham!! Reading in the qual like a fasta works like a charm!! :-)

      Well I have unfortunately encountered another error. I read the qual file in like a fastA file and put it into a hash and I can't write it like a qual file for some reason.. is there some sort of parsing that I have to do that the BioPerl modules won't do for me?

      --------------------- WARNING --------------------- MSG: seq doesn't validate, mismatch is 4040404040404040404040404040404 +040404040404040404040404040404040404040393737373737373737373737373737 +373737373535353535353030302828283030323235373737373737373737373737373 +737373737373737373737373737373535353737343232323232323230161515152427 +2626131313232313131317212121212119211618171918 --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Attempting to set the sequence to [404040404040404040404040404040 +404040404040404040404040404040404040404039373737373737373737373737373 +737373737353535353535303030282828303032323537373737373737373737373737 +373737373737373737373737373737353535373734323232323232323016151515242 +72626131313232313131317212121212119211618171918] which does not look +healthy STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/ +Root.pm:357 STACK: Bio::PrimarySeq::seq /usr/lib/perl5/site_perl/5.8.8/Bio/Primary +Seq.pm:270 STACK: Bio::PrimarySeq::new /usr/lib/perl5/site_perl/5.8.8/Bio/Primary +Seq.pm:221 STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.8/Bio/Seq.pm:484 STACK: tagfinder.pl:227

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://847595]
Approved by AnomalousMonk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (10)
As of 2014-10-26 04:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (151 votes), past polls