http://www.perlmonks.org?node_id=660700


in reply to Re^3: somethign wrong with the sumbit
in thread somethign wrong with the sumbit

First of all i'am VERY GLAD to have solved the problem myself, 2 days ago before i read your last post
It looks like it wasn't an encoding problem at all.I'll tell you later on what i did
Your todays explanation was very insightful and helped me understand even more about encodings
I have also managed to run your test cgi script and saw that the value before submission and the value returned was the same, so indeed the browser returned the value user selected intact exactly the same as the original.What i tried before 2 days was this:
print header( -charset=>'utf8' ); my $article = param('select') || "Αρχική + Σελίδα!"; my @files = glob "$ENV{'DOCUMENT_ROOT'}/data/text/*.txt"; my @menu_files = map m{([^/]+)\.txt}, @files; Encode::from_to($_, 'ISO-8859-7', 'utf8') for @menu_files; if ( param('select') ) { #If user selected an item from the drop dow +n menu $article = decode( 'utf8', $article ); unless ( grep /^\Q$_\E$/, @menu_files ) #Unless user selection do +esn't match one of the valid filenames within @display_files { ......
But as i result i got this: Cannot decode string with wide characters at C:/Perl/lib/Encode.pm line 182.
Line 182 is completely irrelevant with "decode()" and i have no idea why Perl refers to it. Its obvious the problem was on line 35 which is this one: $article = decode( 'utf8', $article );
At the time i had no clue what that error meant, but after your today's reply i now know, that, i was running "decode()" on a string that already had the utf8 flag set, and contained a wide character and as you said Perl would return an error to that


But what does that error tell us now? If my thinking is correct, that error tell us, that the parameter the script(index.pl) got back as a return from the browser was utf8 flagged already!!
Why you ask?! Because this line of code Encode::from_to($_, 'ISO-8859-7', 'utf8') for @menu_files; has created for us an array full of well defined 'utf8 flagged ' items since the Perl script itself created this array. So when the user selects one of them and submits it, the browser grabs this 'utf8 flagged' item and sent it back to the script UNTOUCHED as it has been proved from the error we got above, otherwise we wouldn't get this error, as he supposed to do, and that proves your words to be correct in a previous post on this thread, saying that a browser should not alter a string in any way(not even in an encoding manner).

So now we DO know for sure that the browser ain't sending the string back malformed in any way, because if he were then this line of code: $article = decode( 'utf8', $article ); would have no problem being parsed perhaps because the browser might have removed the "internal utf8 flag" Perl uses to characterize the "utf8" data. Do you agree with me with this logic or have i misunderstood?

If the above is TRUE (original and returned strings are identical) then no conversion has to be made neither by doing encodings or decodings. My script works now as intended with no alternation of encodings here is the code:

print header( -charset=>'utf8' ); my $article = param('select') || "&#913;&#961;&#967;&#953;&#954;&#942; + &#931;&#949;&#955;&#943;&#948;&#945;!"; my @files = glob "$ENV{'DOCUMENT_ROOT'}/data/text/*.txt"; my @menu_files = map m{([^/]+)\.txt}, @files; Encode::from_to($_, 'ISO-8859-7', 'utf8') for @menu_files; if ( param('select') ) { #If user selected an item from the drop dow +n menu #No alternation to utf8 encoding or decoding is needed here....the ret +urned value is consisted of utf8 flag and contains wide characters as + the original unless ( grep /^\Q$_\E$/, @menu_files ) #Unless user selection do +esn't match one of the valid filenames within @display_files { if( param('select') =~ /\0/ ) { $article = "*Null Byte Injection* attempted & logged!"; print br() x 2, h1( {class=>'big'}, $article ); } if( param('select') =~ /\/\.\./ ) { $article = "*Backwards Directory Traversal* attempted & logge +d!"; print br() x 2, h1( {class=>'big'}, $article ); } $select = $db->prepare( "UPDATE guestlog SET article=?, date=?, +counter=counter+1 WHERE host=?" ); $select->execute( $article, $date, $host ); exit 0; } Encode::from_to($article, 'utf8', 'ISO-8859-7'); #Convert user sel +ected filename to greek-iso so it can be opened open FILE, "<$ENV{'DOCUMENT_ROOT'}/data/text/$article.txt" or die $ +!; local $/; $data = <FILE>; close FILE; Encode::from_to($article, 'ISO-8859-7', 'utf8'); #Convert user sel +ected filename back to utf8 before inserting into db $select = $db->prepare( "UPDATE guestlog SET article=?, date=?, cou +nter=counter+1 WHERE host=?" ); $select->execute( $article, $date, $host ); } else {
The only thing i corrected was the $data variable before sending the contents of the file to the javascript.
for ($data) { #Replace special chars like single & double quotes to i +ts literally values s/\n/\\n/g; s/'/\\'/g; s/"/\"/g; tr/\cM//d; }
because single and double quotes were incorrectly interpolated as special chars. I you visit my page now http://nikos.no-ip.org and test it by selecting something you'll notice it works normally

Also you last suggestion still doesn't work:

print header( -charset=>'utf8' ); my $article = param('select') || "&#913;&#961;&#967;&#953;&#954;&#942; + &#931;&#949;&#955;&#943;&#948;&#945;!"; my @files = glob "$ENV{'DOCUMENT_ROOT'}/data/text/*.txt"; my @menu_files = map m{([^/]+)\.txt}, @files; Encode::from_to($_, 'ISO-8859-7', 'utf8') for @menu_files; if ( param('select') ) { #If user selected an item from the drop dow +n menu $article = encode( 'utf8', $article ); unless ( grep /^\Q$_\E$/, @menu_files ) #Unless user selection do +esn't match one of the valid filenames within @display_files { ......
i get this error: Invalid argument at D:\www\cgi-bin\index.pl line 57. Line 57 is a correct line this time trying to open FILE, "<$ENV{'DOCUMENT_ROOT'}/data/text/$article.txt" or die $!; encoding must have messed the variable up somehow....

ps1: Your test cgi script required me to turn taint mode(-T) off in order to run

ps2: I don't yet understand whats the difference of $article = encode( 'utf8', $article ); opposed to $article = decode( 'utf8', $article );

ps3. I cant run the one-linears: i get Can't find string terminator "'" anywhere before EOF at -e line 1. Tried to switch single with double quotes but iam still getting errors.

Replies are listed 'Best First'.
Re^5: somethign wrong with the sumbit
by graff (Chancellor) on Jan 07, 2008 at 06:32 UTC
    Also you last suggestion still doesn't work:
    ... if ( param('select') ) { #If user selected an item from the drop dow +n menu $article = encode( 'utf8', $article ); ...

    The thing that I find astonishing here is that this snippet is NOT what I was suggesting. Look again at my previous reply and focus carefully on the line that has the comment "## ADD THIS LINE". Can you see the difference between the code I suggested and your failed attempt that I quoted just now? It's an important difference.

    My suggestion was to create a new utf8 string by decoding the value returned by "param('select')", so that you could compare this utf8 version of the parameter to the contents of the utf8 filename array. What you did instead was something else entirely, and quite brainless.

    The evidence is pointing more heavily to the conclusion that you are a troll, trying some novel techniques to waste everyone else's time and get people angry. Why else would you make up something stupid that obviously won't work, and assert that this is what I suggested you should do? If you are not a troll, then you are simply incompetent beyond belief. Either way, if this is what you do when people try to help you, people will stop trying, and simply won't take you seriously anymore. Personally, I'm already laughing out loud at everything you post.

    (update: In fact this last reply of yours is really hilarious. It's like you are the Three Stooges, all by yourself! But then, why do I keep replying? Good question... I guess sometimes it's good for a laugh, and sometimes your approach to trolling falls flat, and actually sparks some useful explanations that might be helpful to others, even though it does no good at all to give the information to you.)

    ps1: Your test cgi script required me to turn taint mode(-T) off in order to run

    So your conclusion is that your particular perl/web-server installation is unable to run anything with taint-checks turned on, and in order to make things work, you make sure taint-checks are turned off... Thanks for letting us know (and thanks also for the link to your web site) -- that's very helpful information for everyone who reads PerlMonks. (update: I just noticed... that site doesn't seem to be working at the moment. Maybe Nik pulled the plug on it? Or tripped over the power cord, or when the cream pie missed his face it hit the motherboard. I don't suppose someone would have hacked it already..)

    ps2: I don't yet understand whats the difference of $article = encode( 'utf8', $article ); opposed to $article = decode( 'utf8', $article );

    I always have to think twice about the names myself -- here's how I keep them straight: think of "perl internal utf8" as "normal" and everything else as "coded" (like encrypted to keep it mysterious and secret and obscure); in order to turn a perl-internal utf8 string into one of these mysterious external strings, you have to encode (like encryption), and to make one of those mysterious external strings readable as perl-internal utf8, you have to decode (like decryption) -- and the Encode module is your "secret encoder/decoder" tool, your "Enigma machine". Just remember: "encode" returns something that is external (not flagged as perl-internal utf8); "decode" returns perl-internal utf8 (except when you pass it something that is already perl-internal utf8, which causes it to throw an error).

    ps3. I cant run the one-linears: i get Can't find string terminator "'" anywhere before EOF at -e line 1. Tried to switch single with double quotes but iam still getting errors.

    That's because you are using the "standard" MS-DOS Prompt shell (command.exe or cmd.exe). Try using a unix-style shell instead (bash.exe). It's available for windows from numerous sources (cygwin is probably the most popular), and it fully supports unix-style quotes and escapes for command lines. No doubt, this advice will open up whole new worlds of potential errors you can make -- have fun with that, but don't post those problems here, because they wouldn't be perl questions.

    A reply falls below the community's threshold of quality. You may see it by logging in.