Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

_utf8_on in taint mode

by Sixtease (Friar)
on Nov 10, 2007 at 13:49 UTC ( [id://650056]=perlmeditation: print w/replies, xml ) Need Help??

I started using Perl for CGI very recently (my admin finally enabled CGI support - I had to suffer under the burden of PHP till now). I made a small DBI-driven application and I didn't bother configuring Postgres or MySQL yet, so I use DBI:CSV. My mother tongue is Czech, so I use a lot of diacritics and thus everything I ever write is in utf8.

However, I didn't find a way to tell DBI to open the CSV file in utf8 mode, so everything that went in or out of the database was handled incorrectly. Being one of those who didn't know they do something wrong, I called _utf8_on on everything that came from the database and _utf8_off on everything that went in there.

When I set the taint mode on, _utf8_on simply stopped having effect. I don't know why. A little PerlMonks-SuperSearching and perldoc'ing let me rediscover the truth:

To interpret an (already utf8) string as utf8, use Encode::decode('utf8', $string). The other way around - to make my utf8 strings slip into the non-utf8 database stream uncrippled, I have to encode('utf8', decode('utf8'), $string)).

Why _utf8_on doesn't work in taint mode is still a mystery to me, but at least it made me learn cleaner ways. :-)

Update: I started wondering why I need to encode(decode()) and it's because I'm handling user input, not my own utf8 strings as I said.

Replies are listed 'Best First'.
Re: _utf8_on in taint mode
by dragonchild (Archbishop) on Nov 10, 2007 at 18:04 UTC
    The problem with setting the utf8 mode on the filehandle isn't in DBD::CSV, but in DBD::File. Specifically in the open_table() method of the DBD::File::Statement package. You might want to send a message to the DBI mailing list or email Tim Bunce (DBI maintainer) directly. You will probably also want to cc Jeff Zucker (DBD::CSV maintainer) so that he's aware of the issue.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      You will probably also want to cc Jeff Zucker

      ...or send a private message to jZed ;)

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: _utf8_on in taint mode
by theorbtwo (Prior) on Nov 11, 2007 at 13:21 UTC

    Using a function which has a name beginning with an underscore should be your first clue that you're doing something wrong.

    OTOH, using binmode($fh, ':utf8'); and not realizing you're letting in things that aren't utf8 and will make your program potentially blow up at some undetermined later point (or just behave strangely) is *far* too easy a mistake to make (and one that I've made more then once, thinking back).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://650056]
Approved by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-25 18:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found