S_Shrum has asked for the wisdom of the Perl Monks concerning the following question:
This isn't a Perl question but more of an XML and it's usage question however it applies here...read on:
Maybe it's me...I'm reading a lot these days about XML and the great things that you can do with it FOR WEB CONTENT...but I'm seeing a large trend (here at perlmonks and abroad) toward generating XML databases and I keep thinking to myself, "Why?". I've played around with a XML database I created and thought, "Heeey, thaat's greeaat...but damn that's a lot of work to generate".
Look, every time you start a new 'record' you need to wrap each 'record' and 'field' in a 'header' and then properly close it. This is a great deal of additional data that could just be omitted if the data was placed into a standard database table.
So my question is...well..."Why do it?". I really haven't seen anything out there that really justifies using XML as a database (...just because you can do it doesn't nessecarily mean you should...).
Please enlighten me if I'm wrong or jump on my soapbox (or both..it's a big box).
======================
Sean Shrum
http://www.shrum.net
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by mirod (Canon) on May 24, 2002 at 08:40 UTC | |
XML and databases includes at least 2 different aspects: Does this answers your question? | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by mojotoad (Monsignor) on May 24, 2002 at 05:32 UTC | |
| [reply] [d/l] |
by dsheroh (Monsignor) on May 24, 2002 at 14:10 UTC | |
Hmm... Let's see... Looks like postgres (and, in my experience, most other databases) stores its data internally as plain text unless deliberately encrypted or compressed. Yeah, I'll give you that strings isn't likely to tell you the structure of the data, but getting at the content without going through the database engine is trivial. | [reply] [d/l] [select] |
by mojotoad (Monsignor) on May 24, 2002 at 14:52 UTC | |
| [reply] |
by Abigail-II (Bishop) on May 28, 2002 at 19:26 UTC | |
And that's the same problem as losing the data format. * Why on earth does XML has to be so verbose? It's just LISP, it only needs a lot more characters. Abigail | [reply] |
(jeffa) Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by jeffa (Bishop) on May 24, 2002 at 06:42 UTC | |
Consider making a FAQ: there is a very generic layout for a FAQ. My idea was to use an HTML::Template template file to describe the layout, and use an XML file to contain the content. Sure i could have used a database, but then i would have to write some sort of a front end to enter in the data - why not use XML to wrap my data instead? It seemed easier to me. Here is a stripped down version - it is not perfect and could probably use a revision, but it should be enough to demonstrate. As a matter of fact, i hear that you can even bypass having to use Perl to translate the XML directly to HTML - i'll hopefully learn more about that when my copy of Perl and XML arrives in the mail. (update - doh! i already knew about that - XSLT ;) - and so far that book is some very good reading) There are a total of three files - save them with the suggested names and run the .pl file to generate the final HTML. You can redirect the output to an .html file, or even modify the script to be a CGI script instead. Read more... (4 kB) | [reply] [d/l] [select] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by ajt (Prior) on May 24, 2002 at 08:22 UTC | |
I use to work with DynaBase, and it stored XML in a eXcelon ObjectStore DB, indexed to the tag level. Basically we invented DOM, and held every document in one massive DOM tree in the DB. We could store any kind of structured data you could imagine in a huge XML tree, and find it using a version of search that was XPath aware. If you wanted to find the word foo, in every bar tag with attributes of baz, it could trawl through 2Gb of data faster that you could say Oracle. HOWEVER, if you used it to hold tabular data it would grind to a shuddering halt. It was slow to index, slow to search and slow to extract data from. If you are building a web content management system using human generated, arbitrarily structured content, and you need good searching tools, then a XML database is the best way to do it. If you want to hold anything with a predictable structure then a relational DB is the way to go. The right tools for the right job! My humble 2p. | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by Chmrr (Vicar) on May 24, 2002 at 07:59 UTC | |
One of the neater properties that XML has is the aforementioned portability. That's pretty cool, but most databases allow one to export data from them without tremendous difficulty. To me, the real reason is when you have data that doesn't fit the standard table format. A recent project of mine was to migrate an existing set of static HTML pages into a template-driven, dynamic set of pages. The interesting property that the existing information had was that it had content at varying levels; that is, section 1.1 had some information, while in other areas it nested as deep as 4.1.1.1.1 In addition, any section could have a quiz and/or workbook associated with it. This lent itself, overall, to a structure which would be hard to implement efficiently with standard tables. It proved much easier, conceptually, to plonk all of the data in one XML file, read it at startup, and just grab the needed data out of the data structure thus created. Generally, though, I would agree with you -- most databases are optimized for getting data in and out, and accessing data fast when need be. When the data structure gets hairy, though, I may ask you to pass the XML, please. perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER' | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by Aristotle (Chancellor) on May 24, 2002 at 05:36 UTC | |
I shan’t say much here, because personally I don’t understand the point in using XML for plain ole tabular data either. I do see where XML is great when your data is deeply structured, and you’re dealing with arbitrary element trees. Then XML is great. But for tabular data? Storage in a ralational database and transport via CSV with column headers fill my needs quite nicely. All else is just buzzword hype, I feel. “Ooooh look ma’, XML!!” Makeshifts last the longest. | [reply] |
by Stegalex (Chaplain) on May 24, 2002 at 12:42 UTC | |
I think you will find that you can't ever completely trust the person who is sending you the flat file to structure the file according to the proper business rules. ~~~~~~~~~~~~~~~ I like chicken. | [reply] |
by dsheroh (Monsignor) on May 24, 2002 at 13:53 UTC | |
(If you reply with something along the lines of "But it's easier to write a DTD and feed it through a generic validator than it is to roll your own CSV validator", please provide concrete evidence to back this up. Based on the few examples I've seen, DTDs seem to be a programming language unto themselves, no less complex (and far more verbose) than the level of perl code that would be required for this task.) | [reply] |
by mirod (Canon) on May 24, 2002 at 15:06 UTC | |
by Aristotle (Chancellor) on May 24, 2002 at 15:08 UTC | |
by Stegalex (Chaplain) on May 24, 2002 at 15:22 UTC | |
by dsheroh (Monsignor) on May 24, 2002 at 15:34 UTC | |
| |
by one4k4 (Hermit) on May 24, 2002 at 14:14 UTC | |
_14k4 - perlmonks@poorheart.com (www.poorheart.com) | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by arunhorne (Pilgrim) on May 24, 2002 at 09:43 UTC | |
My feeling is that the big point of XML is its homogenous nature. By storing data in an XML database you instantly ensure that it will be orders of magnitude easier to access it from any device (small PDA to big blue) ... and so in this age that is a big issue. By using an XML database one ensures that a single XML library - e.g. Xerces that any device can retrieve data from a database. Lets face it... Oracle bindings for PDAs!? In addition to this XML is plain text and therefore transportable over traditional protocols such as HTTP without further drivers. Some might argue that using XML for databases is overkill and a waste of space... However I need only point to the falling cost of storage space and processing power. Granted for large databases it will be the case that indexing over and XML file could become prohibitive - particularly due to the ability of the user to arbitrarily modify data. As such it may be the case that an XML interface should be provided to a database backed by an Object-Oriented database (the semantics of the Relational Model finds it increasingly hard to capture deeply structured XML). Its just such a shame Google opted for a SOAP API rather than pure XML for remote access :( ____________Arun | [reply] |
Re: XML for databases?!?! YES!!! With XML, XSL, and SAXON!
by Mission (Hermit) on May 24, 2002 at 13:10 UTC | |
The ability to template web content (content management systems) is becoming a huge business. The concept is to separate your content (in this case XML) from the style (CSS) and from your design (a template.) Now the template concept can be with HTML::Template using XML::Parser or XML::Simple to extract, but I found a quicker way, and it was built into XML. The template that you create is utilizing XSL (Extensible Stylesheet Language) which is a natural parser for XML and applying HTML to it's content based upon your XSL template. For any of you who have created TMPL files with HTML::Template, the XSL is almost identical to it, but you don't have to parse the XML and THEN walk the data through the template!!! Although that discovery was neat, it didn't help much, since you still just viewed the XML in a browser window that supported XML, and then it automatically applied the XSL to it for the display. If by chance you didn't have a browser to view the XML, then you were out of luck. It was at this time that I thought about going back to HTML::Template, but then I discovered another of XML's tools... SAX (Simple API for XML). Actually if you do a search for SAXON, it is a small program that is an interface to the SAX. Essentially you: saxon -o myhtml.html myxml.xml myxsl.xsl Which can be run from a system command from Perl, so there is no issue. You can throw an output (-o filename.html) to make a html file then hand the program the XML and XSL files. The benefit is that now everything is seperate and I've preserved my original data. I no longer have to walk back through my HTML trying to find my XML, and I didn't have to parse the XML myself. The XSL simply is a faster process than parsing and doing the HTML::Template. For more information on the basics of XSL go here: http://www.w3schools.com/xsl/default.asp. For more information on SAXON go here: http://saxon.sourceforge.net/ BTW: XSL is markup, but you will see mention of XSLT as well... it's simply the XSL Translation which is essentially processing the files. (Just to clear up any confusion.) - Mission | [reply] [d/l] |
by ajt (Prior) on May 24, 2002 at 13:39 UTC | |
However from a Perl programmers perspective, you don't need to use an external standalone XSL-T engine such as Saxon, or Xalan (good though they both are). You can get your own application to do it it's self, directly or via a library, this is how Cocoon and AxKit and many From the perspective of a Perl user you can use Matts excellent AxKit framework, or his XML::LibXSLT module directly from within your Perl code. I use XML::LibXML to manipulate XML files, template them with XSL-T, and save the output as HTML files! See Mega XSLT Batch job - best approach?, (in answer to Tilly's question, in testing on a 1Ghz Linux box, from one 1Mb XML file I was able to create over 2000 HTML pages, and associated folders in under 30 seconds!) If used right XML is a very good tool, just remember it's not right for everything, no matter what some people say! Another humble 2p | [reply] |
by S_Shrum (Pilgrim) on May 24, 2002 at 18:43 UTC | |
Don't get me wrong...but all of what you mentioned can be done from flat-files...heck, even I wrote a script that allows me to multi-template & table my data from flat files (without the bulk of additional record and field tags). This is really only a concern when dealing with large data sources. The XML markup is a p.i.t.a. and incrediblely redundant. I'll check the page on Saxon out...no guarentees that I'll convert though. ====================== | [reply] |
by ajt (Prior) on May 24, 2002 at 20:03 UTC | |
If your source file is an XML file, you can have any kind of XML file you want, you can even verify it against an XML Schema or DTD (take your pick) should you want to. You can make a range of XSL-Templates to convert the XML file you have into another XML file. The Transformation always gives you a well-formed new XML (or XHTML) file. You can have a range of XSL-Templates for a range of devices, web browser, WAP phone, another server. If you want you can also use XSL-FO and generate a PDF file should you wish. Should you want you can pipeline one XML document into one round of templating after another, as each step always generated well-formed XML. XML, XSL-T, XSL-FO, DTD, XML-Schemas are all public standards, you can use a range of tools, on different operating systems, in different languages, and most of the time things behave the way you expect them to. Using one XML file and one XSL-template I can generate one HTML file on my Linux box and NT box, using Saxon, Xalan, LibXSLT or even MS-XML, it's pretty predictable. One of the key strengths of XML is that the coding and the content/templates are kept apart. By using XML, you can use any code you want, and any content and it works! As your team gets bigger it means that the coders code and the mark-up people mark-up. There is a good thread on templating versus XSL-T here: XSLT vs Templating? where people put many sound arguments for both sides. Matts in particular as a strong advocate of XML says many things that I could add here. I'm not saying that XML is the best solution, just that it's one solution, and for larger more complex applications, it's focus on flexible structures, and it's wide scale use, does make it the tool of choice. Update: Typos fixed, and XSLT link added | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by webfiend (Vicar) on May 24, 2002 at 14:14 UTC | |
Using XML for broad, general databases doesn't make much sense, no. It's just another wheel being reinvented - in a fairly awkward way, too. I can see it helping in narrow contexts, however. If your application only needs to store a particular kind of data, then a full-tilt relational database might be a little bit of overkill. The FAQ generation that was mentioned before is a perfect example. On my own, I've used XML to store configuration details for a meta-search tool, news items for a small site, and even a simple guestbook CGI. I prefer XML over CSV because of structure issues. I am a geek of Very Small Brain, and I like it when my data storage is self-documenting. Of course I document my own CSV files every time - it would be wrong not to :-) But there have been too many times where I examine a client's data files and find: You can still find bad or no documentation in an XML file, but generally I've had good luck coming across clearly named tags. Even in the worst cases, I've been able to figure out quite a bit from the structure of the markup. (I'm not sure what 'DO4' is, but it doesn't have anything to do with the address, because that's way over in this other element.) Of course, if your project has a lot of data with intricate relations that needs to scale way waay up, then you're back to relational databases. XML is convenient for data storage, yes, but it is not the best tool in all cases. So yeah - XML as a database for small, very specific sets makes sense to me. And yes - the world is nutz. "All you need is ignorance and confidence; then success is sure."-- Mark Twain | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by inblosam (Monk) on May 24, 2002 at 07:48 UTC | |
But that's just my opinion. Michael Jensen michael at inshift.com http://www.inshift.com | [reply] |
by S_Shrum (Pilgrim) on May 24, 2002 at 19:02 UTC | |
===This part to inblosam=== But then again isn't a tabled database "simply organized and parsed" not to mention smaller in size due to the omission of the redundant record and field markup? I could take that a step further and say that tabled data loads faster on large data sources (disk to memory) as a result vs. XML data sources. ===This part for everyone else=== Everyone so far has been "portability-this" and "parsable-that". This is not a question of how one can deal with XML data sources but rather WHY one would choose such a format over a tabluar format (so far I really haven't seen a reason that I couldn't apply to tabled data sources). Portability, converting, and parsing are not ADVANTAGES over tabluar databases as the same can be said for tabled data. Once again: Just because you can doesn't mean you should. Give me some XML database PROs that CANNOT be applied to tabluar databases (flat-files, etc.) ====================== | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by Starky (Chaplain) on May 24, 2002 at 18:09 UTC | |
The answer to this question, as is the answer to so many technical questions, is, "It depends." The main advantages I've found to XML in practice are: There you have it. To me, it's not much more complicated than that. If your data only needs treatment by in-house developers who understand the schema and know a few things about SQL, then it's far more trouble than it's worth. If the data is heirarchical and needs to be represented in HTML and other kinds of documents in a variety of ways or if diverse tools need to exchange data, then it is a very nice tool. Those are the business / technical considerations. The personal considerations for me have been that, like SQL, XML is something that I know I will use time and time again in a variety of situations. So when I had an opportunity to learn to use it, I jumped on it. So while you may or may not decide that it fits your business needs, your human capital will be that much more valuable if you become comfortable with it out sooner rather than later. Hope this helps :-) | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by rucker (Scribe) on May 24, 2002 at 16:39 UTC | |
1. You get the information in XML (perhaps via SOAP), and you use the XML (perhaps with XSLT) for your application, and you need to store it. In this case, you save the trouble of converting it twice (XML->RDBMS->XML). 2. You have no control over the data you are receiving. Today, you might get data for a, b, and c, but tomorrow it might have a, c, x, y, and z. Also, you need to store that additional information in a way you can "easily" use. 3. Your application is simple, but the data is complex (yet easily fits into XML). Why spend a lot of time creating a complex RDBMS scheme when an XML database could handle the job? Also note that (in my experience), reduced development effort is usually worth additional overhead. Even though this is obvious, I'll say it anyway: you have to weigh the trade-off between system overhead and development effort in light of specific project requirements. If we were stricly concerned with system overhead, we wouldn't be using perl at all... or XML :) Rucker | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by mpeppler (Vicar) on May 24, 2002 at 17:01 UTC | |
For example I just read an interesting article on FpML - a specification for data exchange for Over The Counter financial instruments between banks (swaps, FRAs, etc). The problem here is that because these instruments aren't traded on an exchange the instruments aren't standardized, and have different behavior and characteristics depending on who the participants in the deal are. With an XML-based system you can store the information regarding these instruments without having to re-define your database every other day because some smart trader has found a new way of doing a particular trade... This doesn't solve all the problems of course - you still have to interpret the data correctly to perform appropriate accounting/trade reporting/confirmation/etc., but it at least enables the database to store the information without recoding it. Michael | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by MadraghRua (Vicar) on May 24, 2002 at 21:45 UTC | |
As you can probably appreciate, the ability to mix and match data and tools in a very flexible way is pretty paramount in research. So XML and all its ilk are pretty useful to us. So what I'm mainly seeing is the use of databases that store one kind of data, eg a sequence database, a genome database, a transcription database, etc. Then there might be a series of annotation based databases - comments or analyses of the primary data. Rather than creating one big database, folk use DTDs to describe the relationships of the data in the different databases to each other, to create a XML output that can be in turn parsed and fed into differing combinations of analyses tools to support new and changing ideas. This approach is allowing greater flexibility in querying data, reduces the need to tinker with database schemas so much and genereally makes life easier. So concering your post, I would think that if I were working with a fairly simple system, I would be a lot less inclined to put in the effort to develop an XML based data exchange system. If I were going to be working on something that I would like to be widely used by other groups, I would consider going to XML. If I were going to be working on a large project involving several databases, some of which were off site, and using a combination of local and remote tools, I would be using XML. Having written all this, what I'm curious about is has anyone experience with trying to use XML in very large projects. For instance, much of this work has been done on a relatively small scale so far. If you were going to be working with gigabyte or terabyte amounts of data, would XML scale well as a distribution method to pass data between dfferent programs? For instance a mass spectroscopy center would be generating several million data points daily, each data point having 10 to 20 keys and values. An expression center might generate similar amounts of data. You would need to store these data into databases and then schlep some or all of it to downstream programs for analysis. Would an XML based data exchange mechanism cope well in this type of situation? What would be the drawbacks apart from bandwidth?
MadraghRua | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by tmiklas (Hermit) on May 24, 2002 at 16:35 UTC | |
IMHO if I wanted to have an XML database (in my meaning of database) - with unique keys, indexes, etc. I would have a lot of code to write. Sure - I can do that, but is it worth of my time?! I don't think so... I'll use some SQL database then. Anyway - if we are talking about the simple database (how about DB?!), then you can always use plain-text ASCII files with fields of some format. Which one is faster - read whole database checking for specified conditions or load everything into memory and then check/select/whatever? I don't know, but I know, that XML is the best commonly used glue to exchange data of any format! It's simple, it's universal, but writing an XML database larger than a few records is a mistake (IMvHO). Greetz, Tom. | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by mattr (Curate) on May 26, 2002 at 11:26 UTC | |
to separate design from code from style in big web project. to let you (maybe) easily do mobile interface in future for smaller project too to handle changing hierarchical data structures to quickly search tree-based documents with node-aware search paradigm (xpath) which does rock. to maximize interoperability if you are sending lots of data to another party, i.e. data glue. E-commerce transactions made this kind of interchange format a holy grail some years ago. to process ML-based data handed to you, including programming with strong tree metaphor. to drop tabular data into an XML db you're stuck with.. to work with cognitive science relational/hierarchical semantic data like grammar trees (thinking of hypernym tree in Lingua::WordNet) ditto, to work with data from cognitive science that can only be meaninfully be represented or accessed by in a tree-based paradigm, for example statements in predicate calculus in the OpenCyc AI project. The huge knowledge base is a morass of interrelated assertions which themselves are nested logical statements. Horrible, wonderfully neat stuff. See java xml api for it. Yggdrasil for example is a neat-looking XML-based database, that is it is supposed to represent data internally as tree-structured data, which would make it very good for certain applications and bad for others. I wish I had a good problem that needed me to use it.. Actually I do have some hierarchical data but shallow enough to use serialized objects in ordinary object store. As for data interoperability, consider genome processing, which seems to be the new benchmark for large projects with changing definitions of data that would otherwise drive you insane. A poster above mentioned use of XML in that case though at least for medium-sized projects. A different paradigm (BoulderIO, see bio.perl.org) seems to be popular which allows differently defined structured data sets to be processed in a pipeline system. It would seem that implementing too much XML too deep in your system could be real bad unless everything is XML-based. But used as a way to share schemas, could be fantastic. One thing I can say for sure is that I have seen some very slow XML processing systems. So display speed is a big issue for me. In particular I know of one server which uses XML to reformat HTML files for different browsers, which the developers are considering redeveloping in C++ since Java was too slow (or maybe incompetently developed, haven't seen the code myself). So you need to do a tradeoff, possibly. My guess is that initiatives like Sleepy Cat's will make those kind of products easier to develop. The other thing is that you may have to spend a lot of time on interface and manuals if you are going to be handing XML tools to end-users, since their understanding of it and useability will be directly proportional to what they get out of it. I've written an introduction to xpath for end-users, which was not easy to do, and also seen the user interface and xpath search capabilities to be major competitive points in the software. | [reply] |
Re: XML for databases?!?! Is it just me or is the rest of the world nutz?
by Dr. Mu (Hermit) on May 25, 2002 at 17:10 UTC | |
What I was hoping to achieve was a portable, readable file capable of containing a complex Perl data structure, that I could slurp into memory, manipulate, and write back out. XML::Simple is the putative answer to these goals. It was pretty simple, but not brain-dead simple. Some of the arrays embedded in my hash have only one element. Apparently XML, by itself, is unable to distinguish between a single-element array and a scalar, so there is no a priori one-to-one correspondence between plain XML and a Perl data structure. XML::Simple gives you a way to force certain elements to be arrays when the XML file is read, but this amount of finagling was contrary to my objectives. Would I use XML again? Probably not as a general-purpose embodiment of a Perl data structure. For that, I would look around for another format, another module -- or write my own. But my mind is still open for other applications. With this much smoke and heat, there's gotta be a fire somewhere! | [reply] |
by mt2k (Hermit) on May 25, 2002 at 22:12 UTC | |
Yes of course, updating the "database" can be a heck of a time, especiallly via a web server (ie: CGI script). Only one process can update the file at a time, otherwise you get scrambled files, race conditions appear, etc. etc. This means that you must lock a "lock file" before dealing with the data file, so that only one process can access the script at a time. This works great for low-traffic sites, but have more than one access to the script every second, and serving time slows down big time, as each new request has to "get in line" to have access to the database. | [reply] |