Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Re: XML Module Recommendations

by Anonymous Monk
on Jan 26, 2003 at 06:22 UTC ( #229945=note: print w/replies, xml ) Need Help??


in reply to Re: XML Module Recommendations
in thread XML Module Recommendations

Hi, thanks for the excellent reply :)

A few questions...

You said DOM isn't appropriate for general purpose XML transformation - what if I'm just extracting data into a different structure, not necessarily translating it to XHTML or whatever? Also - the LibXML documentation says "This module is an interface to the gnome libxml2 DOM parser (no SAX parser support yet), and the DOM tree." So is it still acceptable in your opinion?

One of the problems I've had in the past is extracting data from a doc with tag names that have identical names, example...

<website> <name>Perlmonks</name> <rating>10/10</name> <people> <name>Anonymous Monk</name> </people> </website>

How would I differentiate between the name inside the people tag and the website name? More of an XML question, but I'm also looking for a module that makes this really easy.

Another thing I'd like to do easily: go through the XML file and pick out certain fields and compare them between multiple entries. For example, get the name and rating of each website so I can pick out everyone with a 10. This seems like it should be trivial (as it is with SQL) but the examples I've seen so far don't always seem so simple.

Also - are there XML::Twig-liek interfaces for other languages? Thanks :)

Replies are listed 'Best First'.
Re: Re: Re: XML Module Recommendations
by mirod (Canon) on Jan 26, 2003 at 07:58 UTC

    The DOM is still dangerous when extracting information,unless tou are very cautious. The main problem is with navigation methods, like getFirstChild: you just cannot use it without wrapping it into your own method. The first child of an element can be a lot of unexpected things: the line return after the element start tag, a comment, a processing instruction...and maybe even the next element. The addition of XPath in XML::LibXML makes it much safer by letting you do $elt->findnodes( 'people') which gives you the list of people elements child of $elt.

    As for differenciating between tags with the same name but in different contexts, XML modules will give you access to the context stack, so it will not be a problem. For example in Twig you can have handlers on website/name or on people/name, in XML::LibXML you would similarly use XPath to get the elements you want.

    In fact the XML equivalent of SQL is XPath (at least within a single document, XML Query deals with collections of documents). A nice resource for XML-related tutorials is Zvon.org, they have a good XPath tutorial.

    XML::Twig is purely perl. Note that if you don't want to use Perl you can always use XSLT, there are plenty of XSLT processors around, some of which can even be called from Perl.

    One last question, especially in light of a recent thread: it seems to me that you are dealing with data, and doing the kind of processing that a database doesvery well. So why are you using XML at all? Couldn't you just model your data into tables and use a DB? There are several portable alternatives that support the kind of processing you seem to be looking for.

      it seems to me that you are dealing with data, and doing the kind of processing that a database doesvery well. So why are you using XML at all? Couldn't you just model your data into tables and use a DB? There are several portable alternatives that support the kind of processing you seem to be looking for.

      I would, but the code needs to be run on many different systems and while I can ensure Perl is installed, I would have a lot of trouble ensuring my database of choice would be.

      Thanks for all the suggestions though, I should be able to find something appropriate now :)

        Have you seen DBD::SQLite yet? It's not just a driver, but a self contained database engine (based on the SQLite library) that doesn't require a daemon at all.

        Makeshifts last the longest.

Re: Re: Re: XML Module Recommendations
by tomhukins (Curate) on Jan 26, 2003 at 20:06 UTC
    The LibXML documentation says "This module is an interface to the gnome libxml2 DOM parser (no SAX parser support yet), and the DOM tree." So is it still acceptable in your opinion?

    It doesn't matter whether anyone else finds this restriction acceptable or not. You need to determine whether this may cause problems for you or not and compare those problems to the benefits you gain from using this code. It depends on your situation.

    Any modern Unix should run libxml2 and some come with it installed or as part of their package system. If you run Windows, PPMs exist for ActivePerl. Windows binaries of libxml2 exist, if you use other versions of Perl on Windows.

    So, code that uses XML::LibXML should run on Unix, including Mac OS X, and Windows. If you need to port your code to other platforms, investigate each platform and see if XML::LibXML runs on it.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://229945]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2020-11-30 14:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?