Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I'd like to agree and extend the last comment by Starky. In the biological sciences we have a lot of different types of data to describe, DNA sequence data, transcribed messenger RNAs, proteins, etc. Then we have a load of different types of experiments that we do, such as measuring levels of transcription, levels of proteins in differing kinds of cells, and so on. We want to be able to compare all these differing types of data in a flexible way. Scientists need to mix and match data as we want to, to support our differing ideas and hypotheses. Finally there are a bunch of different tools that we use. Some of these have reasonably common output formats, some tools have very unique formats. Some tools have been around for a very long time, some will be released tomorrow.

As you can probably appreciate, the ability to mix and match data and tools in a very flexible way is pretty paramount in research. So XML and all its ilk are pretty useful to us.

So what I'm mainly seeing is the use of databases that store one kind of data, eg a sequence database, a genome database, a transcription database, etc. Then there might be a series of annotation based databases - comments or analyses of the primary data. Rather than creating one big database, folk use DTDs to describe the relationships of the data in the different databases to each other, to create a XML output that can be in turn parsed and fed into differing combinations of analyses tools to support new and changing ideas. This approach is allowing greater flexibility in querying data, reduces the need to tinker with database schemas so much and genereally makes life easier.

So concering your post, I would think that if I were working with a fairly simple system, I would be a lot less inclined to put in the effort to develop an XML based data exchange system. If I were going to be working on something that I would like to be widely used by other groups, I would consider going to XML. If I were going to be working on a large project involving several databases, some of which were off site, and using a combination of local and remote tools, I would be using XML.

Having written all this, what I'm curious about is has anyone experience with trying to use XML in very large projects. For instance, much of this work has been done on a relatively small scale so far. If you were going to be working with gigabyte or terabyte amounts of data, would XML scale well as a distribution method to pass data between dfferent programs? For instance a mass spectroscopy center would be generating several million data points daily, each data point having 10 to 20 keys and values. An expression center might generate similar amounts of data. You would need to store these data into databases and then schlep some or all of it to downstream programs for analysis. Would an XML based data exchange mechanism cope well in this type of situation? What would be the drawbacks apart from bandwidth?

yet another biologist hacking perl....

In reply to Re: XML for databases?!?! Is it just me or is the rest of the world nutz? by MadraghRua
in thread XML for databases?!?! Is it just me or is the rest of the world nutz? by S_Shrum

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    [Discipulus]: ah ah! welcome back marto!
    [Corion]: Hi marto and Discipulus!
    [marto]: 2 weeks holiday :)
    [usemodperl]: greetings

    How do I use this? | Other CB clients
    Other Users?
    Others surveying the Monastery: (10)
    As of 2018-06-20 08:46 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (116 votes). Check out past polls.