Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Which is the Best Perl XML Tool?

by sierrathedog04 (Hermit)
on Jan 18, 2001 at 02:58 UTC ( [id://52633]=perlquestion: print w/replies, xml ) Need Help??

sierrathedog04 has asked for the wisdom of the Perl Monks concerning the following question:

My project uses XML::Parser and XML::DOM to do CGI on an Apache server.

I see that apache.org now has a win32 Perl front-end for their Xerces XML parser and is promising one for Red Hat shortly. It seems to me that XML::Parser is the standard for processing XML in Perl, but Xerces is or will be the standard for processing XML in CGI.

So which one should I use to process XML using both CGI and Perl? Should I start learning Xerces, or will its Perl port only be an also-ran for us Perl XML'ers?

Replies are listed 'Best First'.
Re: Which is the Best Perl XML Tool?
by mirod (Canon) on Jan 18, 2001 at 03:17 UTC

    Could you develop Xerces is or will be the standard for processing XML in CGI? I don't understand this sentence.

    I also don't know very well the Xerces Perl wrapper.

    What I know is that I _HATE_ the DOM. Not only is it clumsy and verbose, I also think it leads to insecure programming (a well placed comment in the XML can usually crash a DOM program). So I guess you have my position on Xerces ;--)

    An other pretty popular alternative to generate HTML from XML using XSLT and XPath-based Perl scripts is Matt Sergeant's AxKit, you might want to have a look at it.

      What I know is that I _HATE_ the DOM. Not only is it clumsy and verbose, I also think it leads to insecure programming (a well placed comment in the XML can usually crash a DOM program). So I guess you have my position on Xerces ;--)

      Doesn't Xerces validate the XML document? If a document is validated can it still cause a DOM program to crash?

      I do not deny that DOM processing has very high overhead compared to straight XML::Parser, especially with very large XML documents, but that is the nature of DOM. Easy to understand the tree concept of data but costly, in terms of resources, to realize.

      TIA for your comments,

      fongsaiyuk

        If a document is validated can it still cause a DOM program to crash?

        The problem is that IMHO the DOM only offers one safe method to select elements: getElementsByTagName. This will always behave properly in a DWIM way. All the navigation functions, such as getFirstChid, getLastChild, getPreviousSibling and getNextSibling return a node. Now what if that node is a comment? How many of the DOM scripts out there check every time they use one of those methods that the result is really what they expected, usually an element? Remember that even if the DTD says that a dt is always followed by a dd there can be any number of comments and processing instructions in between. Who codes that defensively and systematically writes this?

        my $dd= $dt; $dd= $dt->getNextSibling until( $dd->getNodeName eq 'Element');

        Hence my guess that a well placed comment can probably wreak havoc in most of the DOM code around (and certainly in most of my own tries at taming the DOM). Practically I suspect people code for a subset of XML, one that excludes comments and processing instructions. This is dangerous for the exact same reasons I described in On XML parsing.

      Someone asked why I think Xerces will become the standard way to do CGI.

      I recently took a class in XML from Northern Virginia Community College. The instructor seemed very knowledgeable in most areas (other than the DOM). He mentioned the Java XML parser built into the Apache Xerces project as being very advanced.

      I visited the Apache Xerces site and it does seem that Xerces is an advanced project. The tie-in with Apache must help a lot, because a lot of us at work do CGI on an Apache server.

      I wonder whether mere roll-your-own programming is going to be enough. With the Apache people running a full-blown effort to develop XML tools for Apache it seems to me that soon most non-Perl people doing Apache CGI work would want to use these tools.

      That is what I meant.

(fongsaiyuk)Re: Which is the Best Perl XML Tool?
by fongsaiyuk (Pilgrim) on Jan 18, 2001 at 09:36 UTC
    maybe you could expand a bit on your thought: My project uses XML::Parser and XML::DOM to do CGI

    huh?

    If you are rendering XML to HTML and want to use perl, like mirod says, check out AxKit.

    If you are just using XML for config file information, you could just stick with XML::Parser and skip the whole Xerces mess. Honestly, I've been watching for the Xerces Perl thing and they've been talking about the "coming soon for RedHat 6.0" for quite a while. I wonder at its state.

    For XML with Perl, you should check out davorg's book on "Data Mudging with Perl." He's devoted a chapter discussing perl and XML... It's worth a read as this is, I believe, one of the first books to mention XML and perl and give examples.

    IMHO, Java still has a bit of an edge when it comes to XML processing. The great tools at CPAN are rapidly closing that gap though.

    Good Luck!

    fongsaiyuk

      maybe you could expand a bit on your thought: My project uses XML::Parser and XML::DOM to do CGI

      We have an online database update application. The boss knows the client is very interested in XML. He suggested I write a new screen that was needed and use XML to do it.

      My preference would have been to assume that the user had an XML-compliant browser such as IE5. However, we are required to target our code for IE4. So I:

      1. Use javascript on the client side to format the update data as one long string of XML and submit that.
      2. Use XML::DOM to parse the XML string and submit it to the database. SQL queries are coded so that they return valid XML.
      3. Return this database-generated XML if the user has checked a box on his form requesting XML output. Otherwise, Use XML::DOM to parse this database-generated XML.

      I should have used XSLT to transform the XML into XSLT. However, last December the XSLT module on www.cpan.org seemed not to work. I want to thank the Dutch university students who appeared to be writing it but it simple was not ready.

      Since then a new XML::XSLT module has appeared. I have not looked into it.

      So that is what I meant by using XML to do CGI. Did we need to use XML? Not really. However, I believe that there are advantages to doing so. For instance, since our application offers the option of returning XML, it would be easy to write a client-side add-on to do more complex processing on the client side.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://52633]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-19 14:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found