Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: Repair malformed XML

by Anonymous Monk
on Feb 04, 2005 at 11:38 UTC ( [id://428016]=note: print w/replies, xml ) Need Help??


in reply to Re: Repair malformed XML
in thread Repair malformed XML

I don't think your algorithm works. Yes, it will create a well-formed XML document, but that's not the same as repairing the document. Consider the following piece of (X)HTML:
<P> foo <SPAN> bar baz <EM> qux </EM> <EM> quux </EM> </P>
The </SPAN> tag is missing. Your algorithm will place it right in front of the </P>. It will repair the document to well-formedness (and in the case of (X)HTML, even to a valid document). But you don't know whether the </SPAN> really belongs there. Perhaps only the 'bar' was supposed to be inside the SPAN. Or maybe the first, but not the second, EM element belonged. Or perhaps it was a special DTD, that doesn't allow EM to appear inside SPAN. Then placing </SPAN> before </P> would be very wrong.

If you have no way of verifying the result is correct - heck, you can't even verify whether the resulting document is syntactically valid - I'd advice you to leave the document as is. Then even the most basic check (for well-formedness) will flag the document to be incorrect. Otherwise, you end up with a document that appears to be correct, but you've no way of knowing. Of course, that raises the question, if you don't have the DTD, how useful is the document, and why is it being considered for "repair"?

Replies are listed 'Best First'.
Re^3: Repair malformed XML
by Anonymous Monk on Jun 23, 2016 at 21:13 UTC
    Hi, I found this conversation very interesting. Did you have any further thoughts on the repair problem (without having a DTD)?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://428016]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-03-29 15:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found