Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

LibXML doesn't encode single or double quotes

by randonpurcell (Initiate)
on Nov 28, 2011 at 21:15 UTC ( #940471=perlquestion: print w/ replies, xml ) Need Help??
randonpurcell has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I'm stuck. I've used LibXML to create a number of XML documents before, but I'm onto something new now.

I've created a valid XML document, using LibXML. All is well. Now, some nodes have text that includes the big five, &<>"'. When I output the XML using toString or toFile I see that the module has graciously encoded the &'s, <'s, and >'s to their respective entity names for me. Great!

Problem is, it doesn't seem to touch single quotes or double quotes. I've searched and searched. I can't seem to find a solution. I need them converted to entity names.

If I need to, I'll use HTML::Entities, but I was hoping to let LibXML handle it all for me...also, I should note, I really need entity names, not numbers, and I couldn't find a way to make HTML::Entities give me the name for single quotes (always uses entity number instead...probably because of IE).

Anyway, your help is most appreciated!

Comment on LibXML doesn't encode single or double quotes
Re: LibXML doesn't encode single or double quotes
by runrig (Abbot) on Nov 28, 2011 at 21:51 UTC
    I believe encoding a quote is only required when it appears in an attribute that is quoted with the same type of quote. Or at least that would make sense.
Re: LibXML doesn't encode single or double quotes
by grantm (Parson) on Nov 28, 2011 at 21:53 UTC

    Why do you think single and double quotes need to be escaped?

    It is necessary to escape a quote character when it is included in an attribute value delimited by the same type of quote. e.g.:

    <person name="Michael &quot;Mickey&quot; Mouse"></person>

    XML::LibXML will escape the quotes in this case.

    But apart from that case, it's perfectly safe to use unescaped quotes in the text body of XML/HTML elements.

      Oh, sorry. I left out a small, crucial detail. The clients that will be parsing this XML expect quotes and single quotes to be in entity form. In fact, it is a requirement, and I have no say in that.
        The clients that will be parsing this XML expect quotes and single quotes to be in entity form.

        How would they even know? Any compliant XML parsing library will decode the entities and pass the actual character up to the application layer. Getting the XML content with entities in their raw form would be a bug that could only cause downstream pain. If that really is a requirement then I feel your pain.

Re: LibXML doesn't encode single or double quotes
by ikegami (Pope) on Nov 28, 2011 at 23:07 UTC
    There's no reason for single quotes and double quotes in text nodes to be escaped. IIRC, even > doesn't need to be escaped.
      > must be escaped in ]]>, if it does not end a CDATA section, though (been bitten recently).
        weird, but true. It's a rule that exists solely so that XML can be implemented in SGML.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://940471]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2014-07-29 08:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (212 votes), past polls