Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^4: Getting mad with CGI::Application and utf8

by Juerd (Abbot)
on Feb 26, 2008 at 20:47 UTC ( [id://670392]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Getting mad with CGI::Application and utf8
in thread Getting mad with CGI::Application and utf8

There is also utf8::is_utf8, but I somehow suspect that the results might be subtly different [from what Devel::Peek reports]

They're not. However, is_utf8 is to be avoided because it's too easy to use it when you shouldn't be doing that. In generaly, you should not be looking at the state of the UTF8 flag unless you're a Perl developer, or wish to learn about Perl's guts. In general, learn about the IOK, NOK, and POK flags first, and then treat the UTF8 flag as if it was called UOK.

  • Comment on Re^4: Getting mad with CGI::Application and utf8

Replies are listed 'Best First'.
Re^5: Getting mad with CGI::Application and utf8
by moritz (Cardinal) on Feb 27, 2008 at 08:42 UTC
    They're not. However, is_utf8 is to be avoided because it's too easy to use it when you shouldn't be doing that. In generaly, you should not be looking at the state of the UTF8 flag unless you're a Perl developer, or wish to learn about Perl's guts.

    So as the average John Doe Perl hacker, what should I use to find out if a certain module or sub returns text strings or binary strings?

    Very often that's only poorly documented, or not at all, and I don't think that "reading the source code" is a good advice either.

      So as the average John Doe Perl hacker, what should I use to find out if a certain module or sub returns text strings or binary strings?

      Warning: culture shock ahead.

      From perlunifaq:

      How can I determine if a string is a text string or a binary string?

      You can't. Some use the UTF8 flag for this, but that's misuse, and makes well behaved modules like Data::Dumper look bad. The flag is useless for this purpose, because it's off when an 8 bit encoding (by default ISO-8859-1) is used to store the string.

      This is something you, the programmer, has to keep track of; sorry. You could consider adopting a kind of "Hungarian notation" to help with this.

      There is no way to determine whether a string is binary or text. Every operation (including your own subroutines) should handle a single mode: either text or binary. If you want to handle both kinds of string, and for any reason need to know the difference between bytes and characters with the same ordinal values, you will have to specify multiple routines, or a way to indicate that a certain string is binary rather than text.

      Just an advance warning: you may want to argue that this is as stupid concept, but eventually you'll have to accept that Perl just works like this. I personally think the model is well thought through.

      See also this journal post and the discussion tree that follows it. I plan to release a module called BLOB that lets you (and everyone else) flag a string as "this is binary, not text".

      Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

        Let me repeat the question: I get some data from a foreign Perl module (let's say a file parser), and that module doesn't document what it returns.

        But other parts of the code have to deal with text strings (for example because they query unicode properties).

        What should I do? I only need to know that once, at write/debug time.

        My current approach is to try to get some data with high codepoints (outside latin-1 range) out of the foreign module, and check with Devel::Peek or utf8::is_utf8 if that stupid flag is that.

        Is there a better, more reliable approach? And is that really an abuse?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://670392]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2024-04-23 17:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found