Not quite. If the DTD tells you, for example, that element a may contain elements b, c, or d, and that b can contain e and f, then if it looks like element a contains one b, and two e's, you can be pretty sure that the b was close improperly (if at all), and the e's should be in b.
There are still many possibilities for confusion. But a heuristic that started with the DTD could do quite a good job. I'm not going to pretend it would be easy and/or fun ... but in theory the information may be there that could do a good job - and, if the DTD does not allow overlaps (such as a and b both allowing d's, so that the d can either be a child or a grandchild of a), you may even be able to do a perfect job.
| [reply] |
Ah, I see. You mean (X)HTML, and 'a' is P, 'b' is SPAN, 'c' is CODE, 'd' is SMALL, 'e' is EM and 'f' is STRONG. So, now you encounter:
<P>foo <SPAN> bar baz <EM> qux </EM> <EM> quux </EM> </P>
Now, assuming tags aren't to be inserted inside words, I still can find five places to put in </SPAN>: before 'bar', between 'bar' and 'baz', after 'baz', between the two EM elements, and before the </P> tag.
Now, if you have a DTD that says that the only possible content of a 'b' is exactly two 'e's, you know where the missing closing tag should have been.
Note also that if you have a DTD where you can always unambigiously deduce where a missing closing tag should have gone, the closing tag is redundant - and if it were an SGML DTD instead of an XML DTD, the closing tag would have been optional. (And that would have solved the problem instantly - the document would be conforming). | [reply] [d/l] |