Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

seperate/extract only annotations from genbank (gbk) file.

by bees.world (Initiate)
on Jun 29, 2011 at 12:25 UTC ( #911948=perlquestion: print w/ replies, xml ) Need Help??
bees.world has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I need a simple command for linux shell to extract only annotations (non-sequence data) form genbank file. I dont need sequences at all..lets say :

ORIGIN 1 tcagaataaa cagacaaccc acagaatgtg agaaaatatt gcaaattat gcatctg +aca 61 aaggtctaat acccagcaat ctataaggaa ctcaaacaaa ttagcaagaa aaaaa +atccc 121 atgaaaaggt agacaaatga catgaataga cacttctcaa aataagatat ataaa +tagcc //

I want to delete evrything in between ORIGIN and // Just need annotations. HELP Plzz..

Comment on seperate/extract only annotations from genbank (gbk) file.
Download Code
Re: seperate/extract only annotations from genbank (gbk) file.
by Neighbour (Friar) on Jun 29, 2011 at 12:44 UTC
    Which bit of your question pertains to Perl?
    Also, if you delete everything between ORIGIN and //, you will end up with nothing, so you could just skip with the whole dataprocessing and use cat /dev/null instead.

      oh god..no. I jus wrote the part I want to delete. theres whole lot of data in a genbank file,,ok let me write a sample:

      LOCUS NW_927708 12387 bp DNA linear CON 25 +-OCT-2010 DEFINITION Homo sapiens chromosome 2 genomic contig, alternate assemb +ly Hs_Celera 211000035800763, whole genome shotgun sequence. ACCESSION NW_927708 VERSION NW_927708.1 GI:88954435 DBLINK Project: 16116 KEYWORDS WGS. SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Eutele +ostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhin +i; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 12387) AUTHORS Istrail,S., Sutton,G.G., Florea,L., Halpern,A.L., Mobarry, +C.M., Lippert,R., Walenz,B., Shatkay,H., Dew,I., Miller,J.R. TITLE Whole-genome shotgun assembly and comparison of human geno +me assemblies JOURNAL Proc. Natl. Acad. Sci. U.S.A. 101 (7), 1916-1921 (2004) PUBMED 14769938 REFERENCE 2 (bases 1 to 12387) AUTHORS Venter,J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J., Sutton,G.G., Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A. TITLE The sequence of the human genome JOURNAL Science 291 (5507), 1304-1351 (2001) PUBMED 11181995 COMMENT REFSEQ INFORMATION: Features on this sequence have been pr +oduced for build 37 version 2 of the NCBI's genome annotation [se +e documentation]. The reference sequence is identical to CH4 +71348.1. Assembly Name: Hs_Celera The DNA sequence was produced by Celera Genomics. It is in +cluded in the NCBI RefSeq collection as an alternative assembly to t +he one produced by the Genome Reference Consortium. The original +whole genome shotgun project has the project accession AADB00000 +000.2. FEATURES Location/Qualifiers source 1..12387 /organism="Homo sapiens" /mol_type="genomic DNA" /db_xref="taxon:9606" /chromosome="2" gap 7139..7188 /estimated_length=50 ORIGIN 1 tcagaataaa cagacaaccc acagaatgtg agaaaatatt tgcaaattat gcatc +tgaca 61 aaggtctaat acccagcaat ctataaggaa ctcaaacaaa ttagcaagaa aaaaa +atccc 121 atgaaaaggt agacaaatga catgaataga cacttctcaa aataagatat ataaa +tagcc 181 acaaacatat gaaaaaataa tcaacatcac taatcatcag gtaaatgcaa attaa +aacca 241 taatgagata ccaccttatc ccagccagaa tggccattat tagaaagtcc aaaaa +caata 301 gatgttggca tggatgtggt gaaaagggaa gagtttacac tgcgggcagg aatgt +aaatt //

      REGARDING PERL: okk I just need the substitution pattern to remove these sequences info. leaving rest other things at its place..

Re: seperate/extract only annotations from genbank (gbk) file.
by Anonymous Monk on Jun 29, 2011 at 12:56 UTC
      Yeah thanks..I thought I would get something quick here..quite urgent.Anyways am searching..thanks for the links.
Re: seperate/extract only annotations from genbank (gbk) file.
by duelafn (Priest) on Jul 01, 2011 at 17:54 UTC
      Hi Dean, The regex you gave is not working :( I have tried replacing ".." with other dots combination, but its not working..can you suggest something?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://911948]
Approved by sundialsvc4
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2015-07-05 21:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls