While we're suggesting modules, I'd also point to libxml2 and associated utilities, which is probably installed if you have a recentish linux installation, and is available through that link if not. It also also has an associated Perl module XML::LibXML. The bonus is, if you install that stuff, you can process the resulting XML with Perl. The drawback to the tidy-based approach is that the libxml2 code is more generic, and so you'd have to work to get DOCTYPE lines to come out correctly; however, libxml2 also has a wider area of application.
If not P, what? Q maybe?