Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: please help me!!

by robsv (Curate)
on Apr 11, 2002 at 18:15 UTC ( #158379=note: print w/ replies, xml ) Need Help??


in reply to please help me parse genbank DNA file

Have you looked into Bioperl? It will simplify parsing for you (especially for the sequence itself). Here's a program that gets the sequence and some other basic information:

#!/usr/local/bin/perl -w use strict; use Bio::SeqIO; my $seqobj; print "please type in the name of a file\n"; my $file = <STDIN>; my $seqio = Bio::SeqIO->new (-format => 'GenBank', -file => $file); while ($seqobj = $seqio->next_seq()) { printf "Sequence: %s\n",$seqobj->seq; # I'm not sure what you need other than the # sequence - here's some examples: printf "Display ID: %s\n",$seqobj->display_id; printf "Description: %s\n",$seqobj->desc; printf "Division: %s\n",$seqobj->division; printf "Accession: %s\n",$seqobj->accession; }
In your program, you're putting all of the non-sequence lines into @annotation. I'm not sure specifically which information you need (i.e. descriprtion, accession number, etc.), but those are all accessible through the "$seqobj" object. There's some examples in the code above; you'll find many more in the documentation.

This method also has the advantage of being able to handle multiple GenBank records per file.

This is just a tiny portion of the functions available with BioPerl - it will also parse BLAST files, perform alignments, etc. If you're interested, you can grab the latest release from CPAN or from BioPerl here. Hope this helps!

- robsv


Comment on Re: please help me!!
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://158379]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (9)
As of 2014-07-31 04:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (244 votes), past polls