heh - can't say I follow all that -- I save the FB page as HTML from Firefox, and run tidy on it to make it XHTML. I'm doing all this on Windows 7 so I have no idea how or where it's being encoded. Tidy does allow various encodings but I seem to be getting wonky results no matter what I set it at.
Anyway I tried running uniquote on text file (test.txt) containing only this string:
sous réserve
Here's what I got:
> perl -nle 'print if /\P{ASCII}/' test.txt | uniquote.pl -vE cp1252
Can't find string terminator "'" anywhere before EOF at -e line 1.
Not sure what that means... appreciate the help...