Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: HTML::Strip and UTF8 -- is there some way I can just skip all the "UTF8 only" entities?

by tphyahoo (Vicar)
on Jan 03, 2007 at 18:45 UTC ( #592810=note: print w/ replies, xml ) Need Help??


in reply to HTML::Strip and UTF8 -- is there some way I can just skip all the "UTF8 only" entities?

Answering my own question (partially), I think I have to do something along the lines of

use strict; use warnings; use Encode::Encoder; my $utf8String="\x{2019}"; my $latin1String = latin1ify($utf8String); print "$latin1String\n"; sub latin1ify { my $string = shift || ""; Encode::encode( "iso-8859-1" , Encode::decode_utf8($string) ); }

which gives "?" and then strip the question marks.

But I have to go now, so I'll finish this another time.


Comment on Re: HTML::Strip and UTF8 -- is there some way I can just skip all the "UTF8 only" entities?
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://592810]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2014-12-26 08:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (168 votes), past polls