<?xml version="1.0" encoding="windows-1252"?>
<node id="983757" title="Fixing broken character encoding" created="2012-07-25 21:14:55" updated="2012-07-25 21:14:55">
<type id="115">
perlquestion</type>
<author id="217641">
pfaut</author>
<data>
<field name="doctext">
&lt;p&gt;Is it possible to use perl to fix broken HTML character encoding?&lt;/p&gt;

&lt;p&gt;I am downloading RSS data from a site and it appears that it was created with a broken program.  It claims to be UTF-8 but I believe it should have been ISO8859-1.  I see things in the text stream that look like &lt;c&gt;&amp;acirc;&amp;#128;&amp;#153;&lt;/c&gt; which should translate to an apostrophe.  I think something grabbed the bytes, converted them to HTML entities and then claimed the result was UTF-8.  I don't know enough about character encoding or the perl modules to manipulate encoding to figure out how I might convert this back to something that displays correctly in a browser.&lt;/p&gt;

&lt;p&gt;I've already complained to the site admins but they haven't fixed the RSS generator yet and I don't suppose they will any time soon.&lt;/p&gt;

&lt;div class="pmsig"&gt;&lt;div class="pmsig-217641"&gt;
&lt;center&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;&lt;font size="-1"&gt;90% of every Perl application is already written. [id://246498|&amp;#8658;]&lt;/font&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;&lt;font size="-1"&gt;&lt;i&gt;[dragonchild]&lt;/i&gt;&lt;/font&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/center&gt;

&lt;/div&gt;&lt;/div&gt;</field>
</data>
</node>
