Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: please help me!!

by robsv (Curate)
on Apr 11, 2002 at 18:15 UTC ( #158379=note: print w/replies, xml ) Need Help??

in reply to please help me parse genbank DNA file

Have you looked into Bioperl? It will simplify parsing for you (especially for the sequence itself). Here's a program that gets the sequence and some other basic information:
#!/usr/local/bin/perl -w use strict; use Bio::SeqIO; my $seqobj; print "please type in the name of a file\n"; my $file = <STDIN>; my $seqio = Bio::SeqIO->new (-format => 'GenBank', -file => $file); while ($seqobj = $seqio->next_seq()) { printf "Sequence: %s\n",$seqobj->seq; # I'm not sure what you need other than the # sequence - here's some examples: printf "Display ID: %s\n",$seqobj->display_id; printf "Description: %s\n",$seqobj->desc; printf "Division: %s\n",$seqobj->division; printf "Accession: %s\n",$seqobj->accession; }
In your program, you're putting all of the non-sequence lines into @annotation. I'm not sure specifically which information you need (i.e. descriprtion, accession number, etc.), but those are all accessible through the "$seqobj" object. There's some examples in the code above; you'll find many more in the documentation.

This method also has the advantage of being able to handle multiple GenBank records per file.

This is just a tiny portion of the functions available with BioPerl - it will also parse BLAST files, perform alignments, etc. If you're interested, you can grab the latest release from CPAN or from BioPerl here. Hope this helps!

- robsv

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://158379]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2017-04-23 09:38 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (430 votes). Check out past polls.