Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Fixing broken character encoding

by Anonymous Monk
on Jul 26, 2012 at 03:02 UTC ( #983760=note: print w/ replies, xml ) Need Help??


in reply to Fixing broken character encoding

It is possible if its been malformed once (single step), multiple iterations can be impossible.

IIRC I think http://validator.w3.org/ can help ( Bundle::W3C::Validator )

As can these
HTML::Encoding - Determine the encoding of HTML/XML/XHTML documents
Encode::Detective - detect a data encoding
Encoding::FixLatin - takes mixed encoding input and produces UTF-8 output
Encode::DoubleEncodedUTF8 - Fix double encoded UTF-8 bytes to the correct one

But you ought to post some minimal html

I figure it ought to be as simple as parsing the file, decoding the entities, treating the string as octets and deciding what charset it is


Comment on Re: Fixing broken character encoding
Re^2: Fixing broken character encoding
by Anonymous Monk on Jul 26, 2012 at 04:04 UTC

    This could work

    #!/usr/bin/perl -- use strict; use warnings; use Data::Dump qw' dd '; use HTML::Entities qw' encode_entities decode_entities '; use Encode qw' encode decode '; use Encode::Detective qw' detect '; my $odata = my $str = '’'; decode_entities($str); dd $str; dd encode_entities($str); my $encoding = detect($str); dd $encoding; $str = decode( 'UTF-8', $str ); dd $str; dd encode_entities($str); __END__ "\xE2\x80\x99" "’" "UTF-8" "\x{2019}" "’"

      That's showing some promise. Thank you.

      90% of every Perl application is already written.
      dragonchild

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://983760]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (10)
As of 2015-07-01 21:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (24 votes), past polls