I would start with a well-formed document of the same type from another source that contains one or two elements of each type that my api can produce. Trim this document manually to remove excessive duplication--and check it is still well-formed. My test application would be to reproduce this document.
My test would not be a simple textual compare, as the layout of two functionally identical XML docs can vary enormously without there being an error. So, I would use one of the XML parsing modules that produces a dumpable hierarchal object and the test would be comparing the dumped (with sorting where applicable) parse-trees.
I'd use diff (the program rather than module) to compare the dumped trees. I'd combine two passes with a shell script &&. The first would use the -q switch to just test same/different. And if it fails, it would invoke the second pass that would use the -y switch piped to more* to present the differences on screen, so that I could inspect them and decide what corrections or addition I would make next.
*I'd probably use windiff for the second pass, its visualisation is easier to digest.
You can set the test up before you start developing the API and get direct feedback as you go along as to what you have done so far, what corrections you need to make, and what needs to be done next.
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>