Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^5: I dislike object-oriented programming in general

by w-ber (Hermit)
on Oct 18, 2007 at 15:53 UTC ( #645762=note: print w/ replies, xml ) Need Help??


in reply to Re^4: I dislike object-oriented programming in general
in thread I dislike object-oriented programming in general

The problem comes when they becomed hyped to the point that all data has to be stored in a RDBMS; and whenever anything isn't in a DB, the file must be XML.

Ah, but what about native XML databases? Now there's hype for you, and you don't ever need to store data in files again!

Talking about opaque data, take a look at the mzXML file format. It's a way to store several mass spectrometry runs in a single file, including the parameters of the mass spectrometer, any extra processing done on the data, and other sorts of metadata.

Mass spectrometry data, since it's, in some sense, a sampled analog signal, consists of floating point pairs with the first number being the mass value (or rather the mass to charge ratio, but this is not important here) and the second the intensity for that mass value.

Now, the designers of the format had more clue than simply stuffing this in the following rather straightforward XML:

<peaks> <peak><mass>100</mass><intensity>1240</intensity></peak> ... </peaks>

(That would be a nightmare!) Instead, they defined that everything else except peak data is structured metadata in the normal XML style, making a DOM tree, and the peak data itself is stored in a base-64 encoded string in IEEE floating point format in network byte order. So, what you have in the file is, in the case of raw data, a few dozen kilobytes of metadata, and then 130 megabytes of binary junk that is completely opaque to any human being.

While it is generally speaking laudable that people try to make common file formats for storing mass spectra -- as usual, all mass spectrometer manufacturers have their own file formats -- they could have just rolled their own file format without the burden of traversing DOM trees while parsing. I guess there were enough programmers in the bunch who were just thinking of the convenience of using standard XML parsing libraries...

--
print "Just Another Perl Adept\n";


Comment on Re^5: I dislike object-oriented programming in general
Download Code
Re^6: I dislike object-oriented programming in general
by BrowserUk (Pope) on Oct 18, 2007 at 16:26 UTC
    a few dozen kilobytes of metadata, and then 130 megabytes of binary junk that is completely opaque to any human being.

    Hmm. Let's see. 130 MB = 17,039,360 values (assuming double precision IEEE). So, if that was formatted as ASCII, assuming that same double precision is required, it would require ~20 bytes per value, ~= 325 MB to render it to the same precision in a human readable format. If we say one pair of values per line and 80 lines per page, that's ~200 reams, or 500 Kg (1/2 tonne) of cheapish printer paper. And that's before you wrap them up in the verbosity of XML.

    Now, how long do you think it would take the human being to peruse that lot and pick out the anomolous pairing? And what value is there in having those values in a human readable format if no one is ever going to read them?

    The point being, that to do anything meaningful with those volumes of data, it is necessary to use software.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Uh, sorry, I wasn't entirely clear. I didn't mean it makes sense to store some 10 or 20 million pairs of floating point values in human-readable form. There is simply a curious discrepancy between using binary data (albeit encoded in ASCII) in a file format that is supposed to be human-readable -- where readability does not really mean human-usable as such. Of course you need computers and software if you want to analyze millions of peaks.

      --
      print "Just Another Perl Adept\n";

        Okay, I get where you are coming from. Though there is something to be said for keeping meta-data together with the data it describes, and there is the benefit of being able to read the meta data (using head or similar) without parsing the entire file.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://645762]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2014-04-19 08:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (478 votes), past polls