Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Network format for Gene Ontology

by matth (Monk)
on Jan 03, 2003 at 11:12 UTC ( [id://223998]=perlquestion: print w/replies, xml ) Need Help??

matth has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have recently been working with a network that has a flat file format something like:

!autogenerated-by: DAG-Edit version 1.314 !saved-by: gwg !date: Mon Dec 23 13:11:22 GMT 2002 !version: $Revision: 2.649 $ !type: % ISA Is a !type: < PARTOF Part of !note: file automatically generated by GO-Editor $Gene_Ontology ; GO:0003673 <biological_process ; GO:0008150 %behavior ; GO:0007610 %adult behavior ; GO:0030534 %adult behavior (sensu Insecta) ; GO:0008044 %response to cocaine (sensu Insecta) ; GO:0008341 % response to c +ocaine ; GO:0042220 %response to ethanol (sensu Insecta) ; GO:0045473 % response to e +thanol ; GO:0045471 %response to ether (sensu Insecta) ; GO:0045474 % response to eth +er ; GO:0045472 %adult feeding behavior ; GO:0008343 % feeding behavior ; GO:00076 +31 %adult feeding behavior (sensu Insecta) ; GO:0030535 %adult locomotory behavior ; GO:0008344 % locomotory behavior ; GO +:0007626 %adult walking behavior ; GO:0007628 %flight behavior ; GO:0007629 %jump response ; GO:0007630 %behavioral fear response ; GO:0001662 %chemosensory behavior ; GO:0007635
The full flat file for this can be found here:

http://www.geneontology.org/ontology/process.ontology

And background info here :

http://www.geneontology.org/

Does anyone know if this is a standard format that is used beyond Gene Ontology (GO). If so, are there perl modules that work with this format? And does anyone have experience in using the XML or MySQL versions of GO which seem to me not so clear to the eye. Maybe some generous people here have beautiful pieces of code that they would like to share by way of example.

Thank you Monks

Replies are listed 'Best First'.
Re: Network format for Gene Ontology
by aging acolyte (Pilgrim) on Jan 03, 2003 at 11:50 UTC
    matth

    You could do worse than check out the bioperl pages. Either at CPAN or Bioperl.

    Either site should lead you to Ewan Birney's Bio::OntologyIO::simpleGOparser or to some of the other work on handling GO files. The Bioperl community is very obliging and usually pretty handy at saving you from reiventing the wheel

    Hope this helps

    A.A.

Re: Network format for Gene Ontology
by tfrayner (Curate) on Jan 03, 2003 at 14:37 UTC
    Hi,

    There is a Perl API available for working with MySQL GO databases in the go-dev CVS repository (the entire CVS tree can be downloaded in one go, although this gives you a lot of stuff you probably won't need). There is further documentation to be found at the Gene Ontology Software Group site).

    There are also a couple of scripts available on the GO FTP site which work with the flat file format. They may give you some idea of how to proceed, should you need to stick with flat files for whatever reason.

    Hope this helps,

    Tim

    Update: Turns out, the Perl API includes a module specifically for parsing GO flat files. Have fun :-)

      That module link is interesting (readmore below). It shows how GO can be passed into XML format. Is this often done? Is the XML format here easy to work with? And can anyone give examples of code with which this XML format is used? I expect that most people work with the flat files. I previously used simple but long perl scripts to pick up network paths and then did the rest of the analysis in Excel.

      <readmore> eg format: go-text for storing graphs and metadata: !version: $Revision: 1.19 $ !date: $Date: 2002/12/11 01:15:01 $ !editors: Michael Ashburner (FlyBase), Midori Harris (SGD), Judy Bla +ke (MGD) $Gene_Ontology ; GO:0003673 $cellular_component ; GO:0005575 %extracellular ; GO:0005576 <fibrinogen ; GO:0005577 <fibrinogen alpha chain ; GO:0005972 <fibrinogen beta chain ; GO:0005973 this is the following file parsed with events turned directly into XML +: <subgraph> <term> <acc>GO:0003673</acc> <name>Gene_Ontology</name> <is_root>1</is_root> </term> <term> <acc>GO:0005575</acc> <name>cellular_component</name> <rel> <type>isa</type> <obj>GO:0003673</obj> </rel> </term> <term> <acc>GO:0005576</acc> <name>extracellular</name> <rel> <type>isa</type> <obj>GO:0005575</obj> </rel> </term> <term> <acc>GO:0005577</acc> <name>fibrinogen</name> <rel> <type>partof</type> <obj>GO:0005576</obj> </rel> </term> <term> <acc>GO:0005972</acc> <name>fibrinogen alpha chain</name> <rel> <type>partof</type> <obj>GO:0005577</obj> </rel> </term> <term> <acc>GO:0005973</acc> <name>fibrinogen beta chain</name> <rel> <type>partof</type> <obj>GO:0005577</obj> </rel> </term> </subgraph>
        Is this often done?

        About once a month :-)

        I suspect that the XML format might be easier to work with than the flat files, but I must confess ignorance. You're quite correct that most of the work on the ontologies by the GO consortium itself uses the flat files with editors/browsers such as DAG-Edit and AmiGO. At present the flat files in the GO CVS are the most up-to-date version of the ontology (I think).

        Tim

        Disclaimer: I'm just an interested party with little to no inside experience; I don't speak for GO :-)

Re: Network format for Gene Ontology
by scain (Curate) on Jan 03, 2003 at 14:39 UTC
    I agree completely with our age-challenged monk above. As to the more general question (is this a standard format of some sort), I don't know the answer, but asking on the bioperl general mailing (bioperl-l) list might get you an answer. Go to http://bioperl.org/MailList.shtml for more info.

    Scott
    Project coordinator of the Generic Model Organism Database Project

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://223998]
Front-paged by wil
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-20 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found