pfaut has asked for the wisdom of the Perl Monks concerning the following question:
Is it possible to use perl to fix broken HTML character encoding?
I am downloading RSS data from a site and it appears that it was created with a broken program. It claims to be UTF-8 but I believe it should have been ISO8859-1. I see things in the text stream that look like ’ which should translate to an apostrophe. I think something grabbed the bytes, converted them to HTML entities and then claimed the result was UTF-8. I don't know enough about character encoding or the perl modules to manipulate encoding to figure out how I might convert this back to something that displays correctly in a browser.
I've already complained to the site admins but they haven't fixed the RSS generator yet and I don't suppose they will any time soon.
90% of every Perl application is already written. ⇒ |
dragonchild |
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Fixing broken character encoding
by moritz (Cardinal) on Jul 26, 2012 at 04:21 UTC | |
Re: Fixing broken character encoding
by Anonymous Monk on Jul 26, 2012 at 03:02 UTC | |
by Anonymous Monk on Jul 26, 2012 at 04:04 UTC | |
by pfaut (Priest) on Jul 26, 2012 at 10:25 UTC |