Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: UTF8 related proof of concept exploit released at T-DOSE

by graff (Chancellor)
on Oct 14, 2007 at 22:26 UTC ( [id://644806]=note: print w/replies, xml ) Need Help??


in reply to UTF8 related proof of concept exploit released at T-DOSE

Given that the exploit relies on using byte sequences that cannot be interpreted as valid utf8 strings, I would think that anyone writing a script that uses the "-T" flag, and expects to handle utf8 data from a tainted source, would prefer to read such input as ":raw", and always use Encode::decode() to convert it to perl-internal utf8 form.

And in doing so, it would usually be prudent to do it like this (adapting the sample code given in the OP):

#!/usr/bin/perl -T use strict; use Encode; %ENV = ( PATH => '/usr/bin' ); open my $filehandle, "< :raw", "test.bin" or die $!; my $word = readline $filehandle; eval { $word = decode( "utf8", $word, Encode::FB_CROAK ) }; if ( $@ ) { warn "unusable input from test.bin\n"; } else { my ($untainted) = $word =~ /^(\w+)$/; if ($untainted) { # It passed the regex, so it is "safe". system "echo $untainted"; } }

Replies are listed 'Best First'.
Re^2: UTF8 related proof of concept exploit released at T-DOSE
by Juerd (Abbot) on Oct 14, 2007 at 23:06 UTC

    I would think that anyone writing a script that uses the "-T" flag, and expects to handle utf8 data from a tainted source, would prefer to read such input as ":raw", and always use Encode::decode() to convert it to perl-internal utf8 form.

    Why go through that trouble if ":encoding(UTF-8)" does exactly the same thing, the same safe way, only with less code?

    Using :raw with decode is exactly as safe as using :encoding(UTF-8), because it literally does the same things internally, only through a different wrapper :)

    Now, :utf8 is unsafe (when reading), but this has nothing to do with taint mode. Of course, in the contrived example in the root node, an informed careful programmer would have done two things differently: they would have used :encoding and they would not have used \w. The scary part, however, is that many careful programmers don't know that what they're doing is dangerous!

    Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

      Why go through that trouble if ":encoding(UTF-8)" does exactly the same thing, the same safe way, only with less code?

      If it is sufficient that the app simply never gets to see a malformed byte sequence (or anything following a malformed character) when reading from a source that is expected to be utf8, you're right -- better to handle it via the ":encoding(utf8)" layer in PerlIO.

      But if there's any need to diagnose the nature of the malformedness, or to recover any amount of usable data following a bad byte sequence within a given input record, then the extra steps involving "decode('utf8',$string,...)" are the only way to do that, I think.

        Using warnings takes care of most, but indeed if you want to catch it and do anything special with it, the extra step is the easiest way. Good point.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://644806]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-03-19 07:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found