Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^2: UTF8 related proof of concept exploit released at T-DOSE

by Juerd (Abbot)
on Oct 14, 2007 at 23:06 UTC ( [id://644811]=note: print w/replies, xml ) Need Help??


in reply to Re: UTF8 related proof of concept exploit released at T-DOSE
in thread UTF8 related proof of concept exploit released at T-DOSE

I would think that anyone writing a script that uses the "-T" flag, and expects to handle utf8 data from a tainted source, would prefer to read such input as ":raw", and always use Encode::decode() to convert it to perl-internal utf8 form.

Why go through that trouble if ":encoding(UTF-8)" does exactly the same thing, the same safe way, only with less code?

Using :raw with decode is exactly as safe as using :encoding(UTF-8), because it literally does the same things internally, only through a different wrapper :)

Now, :utf8 is unsafe (when reading), but this has nothing to do with taint mode. Of course, in the contrived example in the root node, an informed careful programmer would have done two things differently: they would have used :encoding and they would not have used \w. The scary part, however, is that many careful programmers don't know that what they're doing is dangerous!

Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

Replies are listed 'Best First'.
Re^3: UTF8 related proof of concept exploit released at T-DOSE
by graff (Chancellor) on Oct 15, 2007 at 08:57 UTC
    Why go through that trouble if ":encoding(UTF-8)" does exactly the same thing, the same safe way, only with less code?

    If it is sufficient that the app simply never gets to see a malformed byte sequence (or anything following a malformed character) when reading from a source that is expected to be utf8, you're right -- better to handle it via the ":encoding(utf8)" layer in PerlIO.

    But if there's any need to diagnose the nature of the malformedness, or to recover any amount of usable data following a bad byte sequence within a given input record, then the extra steps involving "decode('utf8',$string,...)" are the only way to do that, I think.

      Using warnings takes care of most, but indeed if you want to catch it and do anything special with it, the extra step is the easiest way. Good point.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://644811]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-03-19 10:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found