Re: Re: PayPal Advice Sought

I have a question about that regex. A look at sub unescape in CGI reveals a regex that's nearly identical to the one in question. The first difference is trivial {2}. I'm curious about how significant the use of a signed pack (c) in the CGI regex is, in contrast to the unsigned pack (C) in the other one?

$value    =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; # carg
+o

$todecode =~ s/%([0-9a-fA-F]{2})/pack("c",hex($1))/ge;          # CGI
[download]

For reference's sake here's sub unescape from CGI.pm version 2.46:

# unescape URL-encoded data
sub unescape {
    shift() if ref($_[0]);
    my $todecode = shift;
    return undef unless defined($todecode);
    $todecode =~ tr/+/ /;       # pluses become spaces
    $todecode =~ s/%([0-9a-fA-F]{2})/pack("c",hex($1))/ge;
    return $todecode;
}
[download]

thanks - epoptai

--
Check out my Perlmonks Related Scripts like framechat, reputer, and xNN.

Comment on Re: Re: PayPal Advice Sought Select or Download Code

Replies are listed 'Best First'.
pack 'c',... -vs.- pack 'C',... was: Re: Re: Re: PayPal Advice Sought by ariels (Curate) on Jul 08, 2001 at 01:07 UTC
epoptai's right -- there's no difference between `pack 'c',$number` and `pack 'C',$number`, ever. There is a difference when unpacking, of course. What the translations `'c'` and `'C'` do when packing is to translate an integer to a corresponding character value. Your character values are most likely single-byte numbers. Each corresponds to a specific modulus of integers. In particular, the two most popular ways to assign representative integer values to the 256 bytes are 0..255 ("unsigned char") and -128..127 ("signed char"). But of course any integer value is congruent (modulo 256) to exactly one byte value, whichever of the 2 ranges you pick. So any integer has a unique translation to a byte. The reverse direction (`unpack 'c',$str`) is less single-valued: for instance, `unpack 'C',(pack 'C',-1) == 255`. Here `unpack` has to chose a specific range from which to pick an integer representing the byte value, and the two letter codes make a difference. The same thing occurs for the other signed/unsigned letters for integer conversions in pack/unpack.	[reply] [d/l] [select]
Re: Re: Re: PayPal Advice Sought by MeowChow (Vicar) on Jul 08, 2001 at 01:11 UTC
My version (2.752) of CGI.pm's unescape uses chr instead of pack, by the way: `$todecode =~ s/%([0-9a-fA-F]{2})/chr hex($1)/ge;` [download] I wonder if it's faster... MeowChow s aamecha.s a..a\u$&owag.print	[reply] [d/l]
Re: Re: Re: Re: PayPal Advice Sought by John M. Dlugosz (Monsignor) on Jul 08, 2001 at 02:20 UTC
I found differences re UTF-8 strings, so maybe it was changed because using pack broke when characters were longer than one byte. That shouldn't affect your specific example because you know they are hex digits, but maybe he got rid of packing altogether throughout his code. —John	[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks