I see tidy.exe is giving me this message:
Character codes 128 to 159 (U+0080 to U+009F) are not allowed in HTML;
even if they were, they would likely be unprintable control characters.
Tidy assumed you wanted to refer to a character with the same byte value in the
specified encoding and replaced that reference with the Unicode equivalent.
Here's the very top of the original (pre-tidy'd) HTML file (from our friend the facebook)
<html class=" videoCallEnabled" id="facebook" lang="en"><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta charset="utf-8"><script>CavalryLogger=false;window._script_path = "\/home.php";window._EagleEyeSeed="Nq0j";</script><noscript> <meta http-equiv="refresh" content="0; URL=/?_fb_noscript=1" /> </noscript>
<meta name="robots" content="noodp,noydir">
... followed by loads of scripts and stylesheets.
The output from your command above, run on the html file, produced thousands of characters such as:
Not sure if you're looking for anything in particular. Thanks for your help, Scott