http://www.perlmonks.org?node_id=840156

Perobl has asked for the wisdom of the Perl Monks concerning the following question:

I have been asked to update a script that I did not write. The script is for form processing and the section I'm sharing is basically used to evaluate user data submitted as key-value pairs. I'm having trouble seeing any tangible purpose in four lines of code below (where $value is evaluated):
read (STDIN, $temp, $ENV{'CONTENT_LENGTH'}); @pairs = split(/&/, $temp); foreach $item(@pairs) { ($key, $value) = split(/=/, $item, 2); $value =~ tr/+/ /; $value =~ s/%(..)/pack("c", hex($1))/ge; $value =~ s/\t/ /g; $value =~ s/\0//g; }
I understand what each line does ... for example ...
$value =~ tr/+/ /;
performs transliteration on $value and thus translates characters like !, @, #, etc. to their hex equivalents ... that is, # becomes %23.
$value =~ s/%(..)/pack("c", hex($1))/ge;
The line above then takes the two "hex" digits that make up the string and packs them into their hex equivalent.
$value =~ s/\t/ /g;
This code removes tabs?
$value =~ s/\0//g;
And this code removes nulls? I'm missing the point of these four lines of code used together. It doesn't seem like they're really protecting anything??? Am I missing something? Thanks for your help!

Replies are listed 'Best First'.
Re: Need help understanding some code ...
by JavaFan (Canon) on May 15, 2010 at 21:59 UTC
    I'm missing the point of these four lines of code used together.
    So do I. Those four lines modify $value, but then $value isn't used. In the next interation, $value is set to something else. At best, some unshown code uses $value, but then it will only have the (modified) value of the last pair.
Re: Need help understanding some code ...
by Crackers2 (Parson) on May 15, 2010 at 23:39 UTC

    For completeness sake...

    I understand what each line does ... for example ...
    $value =~ tr/+/ /;
    performs transliteration on $value and thus translates characters like !, @, #, etc. to their hex equivalents ... that is, # becomes %23.

    No, all it does is replace plus characters with spaces.

    $value =~ s/%(..)/pack("c", hex($1))/ge;
    The line above then takes the two "hex" digits that make up the string and packs them into their hex equivalent.

    And this does the opposite of what you thought the first line does, i.e. it replaces things like %23 with # etc. (Perhaps that's what you were saying but I couldn't quite tell from your wording)

Re: Need help understanding some code ...
by Anonymous Monk on May 15, 2010 at 19:03 UTC
      Anonymonk:
        ...or perhaps a wildly misguided attempt to untaint the input?

      I might buy that notion, too, except for one thing: In the code we can see, we assign input to $value: transmute it (but not into gold); and abandon it without using it.

      I wholeheartedly agree with your recommendation to use CGI or its relatives, but I don't think the code we see is doing any "parsing/decoding CGI" unless useless inconvenience to a whole bunch of electrons satisfies your definition. Rather, I have to go with OP's interpretation: lines 4-8 are useless; a longwinded NoOp; don't do anything useful.
            unless the snippet shown somehow misrepresents a bigger picture.

      Oops. That's what JavaFan says below, and more succinctly! ++ JavaFan.

      Update: Fixed the misquote, "parsing decoding CGI"
      And later, clarified para 2.

Re: Need help understanding some code ...
by AnomalousMonk (Archbishop) on May 15, 2010 at 23:54 UTC
     $value =~ tr/+/ /;
    performs transliteration on $value and thus translates characters like !, @, #, etc. to their hex equivalents ... that is, # becomes %23.

    No. Assuming there's any point to the code in the first place (see JavaFan's reply), the tr/// transliteration operator does limited character substitution.
         $value =~ tr/+/ /;
    transliterates  + plus signs into spaces.

    >perl -wMstrict -le "my $value = '!@#+&*'; $value =~ tr/+/ /; print qq{'$value'}; " '!@# &*'
Re: Need help understanding some code ...
by Marshall (Canon) on May 16, 2010 at 08:19 UTC
    I also see some rather strange things in this code. Even the first line:  read (STDIN, $temp, ENV{'CONTENT_LENGTH'}); looks strange to me! A more normal thing would be:  my $actually_read = read (FILEHANDE, $temp, $max_buf_size);

    If $actually_read == $max_buf_size that could mean that there is more left to be read. Also anything that doesn't have actual data in, is "unknown" (not necessarily what Perl calls undef) because there could be some previous value in $temp? So if you are going to process input data like this, attention needs to be paid to what is actually in the $buffer (number of bytes), not the max size that you intend to allow for this read.

    I also don't see anything that would lead me to believe that you are reading anything but ASCII or unicode representable text in which case more common Perl string reads would be more appropriate. If these strings end in NUL, you can define $/=0; (input record separator instead of default of \n). Also, like others, I don't see how this code works for any more than one pair.

    If you could put a short line into a test file, text.txt and then run this small script, we would see what is really in there:

    #!/usr/bin/perl -w use strict; my $buf; open (IN, "test.txt") || die "can't open test.txt"; binmode (IN); my $actual = read (IN, $buf, 2048); foreach my $i (0..$actual-1) { printf "%02X ", ord(substr($buf,$i,1)); }
Re: Need help understanding some code ...
by Krambambuli (Curate) on May 16, 2010 at 09:42 UTC
    Looking at something like
    #!/usr/bin/perl use strict; use warnings; my $temp = 'key1=two_tabs%09%09+other_two_tabs%09%09&key2=three_nulls_ +%00%00%00'; my @pairs = split(/&/, $temp); foreach my $item(@pairs) { my ($key, $value) = split(/=/, $item, 2); $value =~ tr/+/ /; $value =~ s/%(..)/pack("c", hex($1))/ge; $value =~ s/\t/ /g; $value =~ s/\0//g; print "Key: '$key', value: '$value'\n"; }
    you'll get
    
    Key: 'key1', value: 'two_tabs   other_two_tabs  '
    Key: 'key2', value: 'two_nulls_'
    
    
    where the _encoded_ TABs and NULLs are transformed to spaces and respectively eliminated.

    But as others mentioned already, you shouldn't do this as above if trying to clean up the inputted data but use a module that handles things like this in much more detail.


    Krambambuli
    ---