Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

CDATA in an XML file for parsing.

by nisha (Sexton)
on Jan 02, 2006 at 10:42 UTC ( #520371=perlquestion: print w/replies, xml ) Need Help??
nisha has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I needed some help for XML parsing. I was using a module called XML::Simple and Data::dumper, but my script has 2 XML files further embedded in them using CDATA. Dont know in this case how do i go about modifing the values in the XML file, coz the section in the CDATA cannot be read. Please guide me regarding this...If there is some other module which can help me achieve this! Thanks, Nisha

Replies are listed 'Best First'.
Re: CDATA in an XML file for parsing.
by Aristotle (Chancellor) on Jan 02, 2006 at 11:28 UTC

    You should pay no attention to the fact that it’s in CDATA. All that means is that angle brackets should be considered to be escaped; CDATA has the same semantics as plain text included in the document with angle brackets written as < and >.

    In other words the section in CDATA will appear as a simple string, by design. That’s what CDATA is for. You need to get this string, and then in turn feed it to an XML parser.

    Of course, embedding escaped markup in XML is a bad idea (see even more on that); if you have any say in the design of the format, you should ask that it not be done.

    Makeshifts last the longest.

Re: CDATA in an XML file for parsing.
by esskar (Deacon) on Jan 02, 2006 at 11:11 UTC
Re: CDATA in an XML file for parsing.
by BaldPenguin (Friar) on Jan 02, 2006 at 21:46 UTC
    Stick with XML::LibXML , while it may be easy to say Don't do that witht CDATA and embedded markup, I believe it's impractical. I have many stylesheets and XML pages that dynamically generate HTML including javascript sections based upon the XML nodes. Even if you have control over how the XML files are created, it may not be prudent to remove all embedded tags. It is in that regard that the ability to include embedded tags in CDATA Sections was created/allowed (in my opinion anyway).

    Everything I've learned in life can be summed up in a small perl script!

      The point is that <![CDATA[<foo>]]> and &lt;foo&gt; mean exactly the same thing. If you need to treat them differently, some piece of software in your chain is broken. (Yes, that means serving XHTML as HTML is broken.)

      CDATA is a shortcut for when text contains a lot of literal angle brackets and carries no further meaning.

      Makeshifts last the longest.

        I do not disagree that they are the same, and I have no intent on treating them differently. However, your comment on some peice of software in the chain being broken caught my attention. On of the main uses I have for XML is actually XSLT transforms. In that code I have in the paste used blocks of data from a database that contain XML markup, in sort of an basic content management schema. What would your suggestion be for obtaining that content and inserting it into the XML document before it goes to the transfomer.

        Everything I've learned in life can be summed up in a small perl script!
Re: CDATA in an XML file for parsing.
by dimar (Curate) on Jan 02, 2006 at 23:29 UTC

    You will do well to consider the admonition made earlier by Aristotle. If you find yourself tempted to reach for the 'CDATA' or 'entity-escaping' key-combination in your text-editor or IDE, do not do it unless and until you have given serious consideration to alternatives.

    Such as:
    • apply base64 encode and base64 decode
    • supply a link to the external (non-well-formed) resource

    If you have control over the generation of the content, there is no excuse why you should not at least *consider* the alternatives.

    If you do *not* have control over the entire content, it is all the more reason to consider the woes of naively tossing around CDATA and escaping.

    • What if the content itself contains a 'CDATA' section that is intended to be displayed as an example of how to make a 'CDATA' section?
    • What if the content contains (ampersand)nbsp; that is intended to demonstrate the symbol used to represent blank space (and not intended to be rendered as an actual blank space)?
    • What if the content contains typos that just coincidentally happen to look like codes in your escaping mechanism?

    There are many many reasons why escaping and CDATA is often a bad way to go. This badness is exactly why perl has 'quotelike operators' and why MIME has 'multipart boundary delimiters'. XML has neither of these, so often people resort to CDATA and escaping, even when that is *not* the best, (or even a good) way to go.


      Note that these “What if” questions all have unambiguous answers; any ambiguities result from buggy software, not from the XML spec. The reason for avoiding escaped markup within CDATA is not that it causes ambiguities in the data, but the many difficulties embedded markup causes for processing and that the well-formedness guarantee you get by using XML is thrown out of the window.

      Makeshifts last the longest.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://520371]
Approved by Corion
[marto]: water?
[marto]: see Super Search

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (8)
As of 2017-01-19 11:57 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (170 votes). Check out past polls.