I'm taking a roughly similar strategy now. Previously I had many of these:
is_xml $result, $expected, '... and we should receive the correct XML';
By generating $expected with a different algorithm (to ensure that I'm not just reaching into buggy code), I had to constantly maintain two different sets of code which did the same thing. It was very painful. Now I'm doing this:
is_well_formed_xml $result, '... and we should receive well-formed XML';
And to verify that it's correct, I'm adding more high-level integration testing. It's an annoying trade-off, but like what you're suggesting, it's a reasonable one.
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>