http://www.perlmonks.org?node_id=34555

gregorovius has asked for the wisdom of the Perl Monks concerning the following question:

I'm using Perlscript and ASP on WindowsNT and IIS (sigh!). I want to use the ASP $Application object to store frozen (serialized) objects so I can have them available anywhere within my app. The problem I have is that the <emph>dear</emph> IIS object won't store characters greater than 127. The freeze function in Storable always returns characters greater than 127. I've worked around this problem by uuencodeing and uudecodeing the frozen objects before and after saving into the $Application object, but this takes very long since my objects are large. From a benchmark I made, uudecoding an object (a three level hash with 17,000 items) takes 0.6 seconds, whereas thawing it only 0.05.

If I can get rid of the uudecode time this technique would buy me a speedup of 20x, since the alternative (regenerating objects each time from a database) takes <emph>very</emph> long.

Should I hack into Storable and make it freeze to 7 bit characters? (please see testcase at bottom)

I am experiencing these woes in part because I was expecting Perscript on IIS to cache pre-compiled scripts and module data in memory, like mod_perl does, but is not the case. (I've also tried ActiveState's PerlEx, but stumbled upon rock trying to make it stop generating headers automatically (so cookies can be sent to clients) and support for fixing this form ActiveState was close to nil). Thanks in advance!

<%@ Language=PerlScript %> <% use Storable qw(freeze thaw); use Convert::UU qw(uuencode uudecode); my %h = (llave1=>'valor1', llave2=>{llave3=>'valor3'}); my $h_froz = freeze \%h; $main::Application->Contents->SetProperty('Item', 'prueba_app', $h_froz ); my $h_recup = $main::Application->Contents('prueba_app'); $result = 'RESULTS:'; foreach(0 .. length $h_con) { my $ch1 = substr $h_con, $_, 1; my $ch2 = substr $h_recup, $_, 1; if (ord($ch1) == ord($ch2)) { $result .= "$_ SAME " . ord $ch1; } else { $result .= "$_ DIFFERENT " . ord ($ch1) . '-' . ord ($ch2); } $result .= '<BR>'; } %> <html> <body> result: <%= $result %> </body> </html>

Replies are listed 'Best First'.
(tye)Re: ASP and Storable woes
by tye (Sage) on Sep 29, 2000 at 08:19 UTC

    uuencoding seems like a bit of overkill in order to drop the 8th bit. Here is a much simpler method (well, it is conceptually simpler but Perl's built-in uuencoding makes it harder to code in Perl) that might be much faster. It most likely (depending on the concentration of "8-bit characters") will write more bytes but that might not be much of a problem in comparison to the CPU usage of uuencoding. Then again, encoding perl.exe my way takes a ton fewer bytes than uuencoding does.

    The concept is to pick two 7-bit characters, $quote7 and $quote8. $quote7 is used to "quote" these two characters. $quote8 is used to "quote" any "8-bit characters". So you replace any 8-bit character with $quote8 followed by that character with the 8th bit stripped. You preceed any occurrances of $quote7 or $quote8 with $quote7. I hope it is obvious how you reverse the process.

    Here are the conversion routines wrapped in a test program (and it actually works):

    #!/usr/bin/perl -w use strict; { my( $quote7, $quote8, %quote, %unquote ); BEGIN { $quote7= pack "C", 0x7e; # Any 7-bit char. $quote8= pack "C", 0x7f; # Any _other_ 7-bit char. @quote{ $quote7, $quote8 }= ( $quote7.$quote7, $quote7.$quote8 ); @quote{ map { pack "C", $_ } 0x80..0xff }= map { $quote8 . pack "C", $_ } 0..0x7f; %unquote= reverse %quote; } sub strip8 { my( $bin )= @_; $bin =~ s#([$quote7$quote8\x80-\xff])#$quote{$1}#go; return $bin; } sub restore8 { my( $str )= @_; $str =~ s#([$quote7$quote8].)#$unquote{$1}#gos; return $str; } } die "Usage: $0 file\n" if 1 != @ARGV; open IN, "<$ARGV[0]" or die "Can't read $ARGV[0]: $!\n"; binmode IN; undef $/; my $in= <IN>; my $out= strip8( $in ); my $end= restore8( $out ); die "Well, that didn't work!" if $in ne $end;

    I used the hashes in a quest for execution speed but since I didn't run benchmarks I can't guarentee that other methods aren't tons faster.

            - tye (but my friends call me "Tye")
(dchetlin: Data::Dumper) Re: ASP and Storable woes
by dchetlin (Friar) on Sep 29, 2000 at 06:49 UTC
    Before you go to the trouble of hacking Storable, why not give Data::Dumper a try? It certainly won't be as fast as raw Storable, but it almost as certainly will be faster than Storable -> uuencode. As a bonus, it will probably be more compact than Storable (as long as you set $Data::Dumper::Indent to 0). And it shouldn't give you any high-bit characters.

    -dlc

      Thanks, dchetlin!

      The mechanics for doing it with Data::Dumper are the following:

      my %h = (key1=>'value1', key2=>{key3=>'value3'}); my $stringy = Dumper \%h; { no strict 'vars'; %h = %{eval($stringy)}; die"$@" if $@; }
      That eval there takes 0.25 seconds on my 17,000 elements hash. This is much better than the 0.47s for uudecoding and thawing of the same hash, but still five times more than the 0.05s of thawing that I would get if that peculiar $Application object worked right.
      I just read the Data::Dumper docs. Indent has no effect on the correctness of the output. Only how readable it is to the human eye.

      Purity and Terse are more related to allowing nested seralized data to be reconstructed. See the section "Configuration Variables or Methods" for the gory details.

        Er ... what I said was:

        • As a bonus, it will probably be more compact than Storable (as long as you set $Data::Dumper::Indent to 0).

        I didn't mention correctness, I mentioned compactness -- with $Indent set, you get an extremely compressed string that will more often than not be smaller than Storable's binary encoding of the same data.

        The drawback is of course that Storable will always be faster.

        -dlc

Re: ASP and Storable woes
by gregorovius (Friar) on Sep 29, 2000 at 12:25 UTC
    Darn! It turns out that I was naive and jumped to the first conclusion at hand. The $Application object does admit 8 bit chars, it was character 0x00 the one to blame! After running tye's code and looking carefully at the output I found out that $Application returns strings that are truncated at the first appearance of character 0x00. This char seems to be a string terminator for whatever language the object was coded with (a vague memory of having learned this at college comes afloat).

    After running a modified version of tye's code (below) I was able to further push the thawing time for my 17K elements hash to 0.15s. This and the time it takes to move memory from $Application gives me a decent 0.26s total, which is 10x better than going to the database and well enough for our expected site traffic.

    One big and a respectful bow towards tye!

    { my( $quote7, $quote8, $zero, $one, %quote, %unquote ); BEGIN { $quote7= pack "C", 0x7e; # Any 7-bit char. $quote8= pack "C", 0x7f; # Any _other_ 7-bit char. $zero = pack "C", 0x00; $one = pack "C", 0x01; @quote{ $quote7, $quote8, $zero }= ( $quote7.$quote7, $quote7.$quote8, $quote7.$one ); %unquote= reverse %quote; } sub strip8 { my( $bin )= @_; $bin =~ s#([$quote7$quote8$zero])#$quote{$1}#go; return $bin; } sub restore8 { my( $str )= @_; $str =~ s#([$quote7$quote8].)#$unquote{$1}#gos; return $str; } }