Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: HTML::Entities and Unicode quotes

by ikegami (Pope)
on Aug 20, 2011 at 06:22 UTC ( #921382=note: print w/replies, xml ) Need Help??


in reply to Re: HTML::Entities and Unicode quotes
in thread HTML::Entities and Unicode quotes

The internal storage format (returned by is_utf8) has absolutely nothing to do with this.
use Encode; use HTML::Entities; my $str = "\xe2\x80\x9cquotes\xe2\x80\x9d"; utf8::downgrade($str); print Encode::is_utf8($str) ? 1 :0, " ", encode_entities($str), "\n"; utf8::upgrade($str); print Encode::is_utf8($str) ? 1 :0, " ", encode_entities($str), "\n";
0 “quotes” 1 “quotes”

Replies are listed 'Best First'.
Re^3: HTML::Entities and Unicode quotes
by Your Mother (Bishop) on Aug 20, 2011 at 16:41 UTC

    It was just to show what the natural state of the strings was assumed to be by perl. You artificially flipped the switch on/off—no decoding or encoding. You also showed code using the functions of utf8 which is probably a bad example to set. You know exactly what you’re doing but someone who doesn’t sees a top monk using it they think, oh, that must be a good idea, I’ll use upgrade and downgrade to “fix” my encodings too.

      You know exactly what you’re doing but someone who doesn’t sees a top monk using it they think, oh, that must be a good idea

      You're making my point for me. is_utf8 should never ever ever ever be used. So what are you doing using it, especially to someone who think it might be ok to use it?

      You artificially flipped the switch on/off—no decoding or encoding.

      No, I didn't. The UTF8 flag does not indicate any such thing.

      Not only are you showing a function you shouldn't be using, you're showing how to use it incorrectly.

      You also showed code using the functions of utf8 which is probably a bad example to set.

      There's nothing wrong with the utf8:: module. There's something wrong with the is_utf8 function, though. is_utf8 should never ever ever ever be used.

      On the other hand, there's absolutely no problem using ugprade and downgrade. First, they're not suppose to have any effect whatsoever. Secondly, they are required to work around bugs in Perl and XS modules.

      You're the one setting the bad example. I just had to use advanced functions to show that.

      they think, oh, that must be a good idea, I’ll use upgrade and downgrade to “fix” my encodings too.

      Good! You seem to be implying that's bad, but if upgrade or downgrade have an effect, they are the correct tool to use. They'll only do a difference when faced with a bug, and they're the only tool that will help in that situation.

        Excepting the demonstration of what perl believed the strings to be, my answer’s code content was essentially identical to yours.

        The Re^15-deep feuds about this sort of thing in the past are suddenly less mysterious.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://921382]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2018-09-24 03:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Eventually, "covfefe" will come to mean:













    Results (191 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!