http://www.perlmonks.org?node_id=592810


in reply to HTML::Strip and UTF8 -- is there some way I can just skip all the "UTF8 only" entities?

Answering my own question (partially), I think I have to do something along the lines of

use strict; use warnings; use Encode::Encoder; my $utf8String="\x{2019}"; my $latin1String = latin1ify($utf8String); print "$latin1String\n"; sub latin1ify { my $string = shift || ""; Encode::encode( "iso-8859-1" , Encode::decode_utf8($string) ); }

which gives "?" and then strip the question marks.

But I have to go now, so I'll finish this another time.

  • Comment on Re: HTML::Strip and UTF8 -- is there some way I can just skip all the "UTF8 only" entities?
  • Download Code