Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

IO::Uncompress::Gunzip to scalar takes hours (on windows)

by cmv (Chaplain)
on May 21, 2013 at 15:40 UTC ( #1034568=perlquestion: print w/ replies, xml ) Need Help??
cmv has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks-

I'm seeing a strange issue when using IO::Uncompress:Gunzip on windows that I can't explain. The same code runs fine under Unix. I'd like to understand why.

The code below will take about 20-seconds to unzip a 65M zipped file on my windows xp box.

If I comment out the first gunzip line, and uncomment the second gunzip line, it will take hours on the windows box (6 hours last time I bothered to try). Any thoughts?

Thanks

-Craig

#!/opt/exp/bin/perl5.8 use strict; use warnings; use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ; my $ifile = shift || die "Missing input file name\n"; open(IFILE, "<$ifile") || die "Can't open $ifile"; binmode(IFILE); my $inputstr = join('', <IFILE>); print STDERR scalar(localtime), " - STARTING UNZIP!!!\n"; # This is fast on all platforms gunzip \$inputstr => "clearfile" or die "gzip failed: $GunzipError\n"; my $cleartxt; # This takes forever on windows, no problem on Unix #gunzip \$inputstr => \$cleartxt or die "gzip failed: $GunzipError\n"; print STDERR scalar(localtime), " - UNZIPPED COMPLETE!!!\n";
UPDATE1:

Here is some profile data I got from running both cases on a smaller gzipped input file (4.8M)...

Profile Data for FAST Method

Total Elapsed Time = 3.667486 Seconds User+System Time = 1.200486 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 37.2 0.447 0.447 7281 0.0001 0.0001 Compress::Raw::Zlib::infl +ateStream ::inflate 19.8 0.238 1.296 1 0.2377 1.2963 IO::Uncompress::Base::_rd +2 14.9 0.179 0.200 7293 0.0000 0.0000 IO::Uncompress::Base::sma +rtRead 8.66 0.104 0.979 7281 0.0000 0.0001 IO::Uncompress::Base::_ra +w_read 6.25 0.075 1.059 7283 0.0000 0.0001 IO::Uncompress::Base::rea +d 5.66 0.068 0.068 7281 0.0000 0.0000 IO::Uncompress::Base::pus +hBack 4.66 0.056 0.056 14575 0.0000 0.0000 IO::Uncompress::Base::sav +eStatus 3.25 0.039 0.039 14562 0.0000 0.0000 U64::add 3.08 0.037 0.496 7281 0.0000 0.0001 IO::Uncompress::Adapter:: +Inflate:: uncompr 1.92 0.023 0.224 7281 0.0000 0.0000 IO::Uncompress::Base::rea +dBlock 1.33 0.016 0.016 1 0.0160 0.0160 IO::bootstrap 1.33 0.016 0.076 4 0.0040 0.0191 main::BEGIN 1.33 0.016 0.060 9 0.0018 0.0067 IO::Uncompress::Gunzip::B +EGIN 1.25 0.015 0.015 10 0.0015 0.0015 strict::unimport 1.25 0.015 0.030 37 0.0004 0.0008 IO::Compress::Base::Commo +n::BEGIN

Profile Data for SLOW Method

Total Elapsed Time = 154.4478 Seconds User+System Time = 154.2188 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 98.6 152.1 152.14 7286 0.0209 0.0209 Compress::Raw::Zlib::infl +ateStream 0 ::inflate 0.34 0.519 0.573 7298 0.0001 0.0001 IO::Uncompress::Base::sma +rtRead 0.28 0.438 154.04 7286 0.0001 0.0211 IO::Uncompress::Base::_ra +w_read 0.18 0.283 152.49 7286 0.0000 0.0209 IO::Uncompress::Adapter:: +Inflate:: 9 uncompr 0.18 0.275 154.31 7288 0.0000 0.0212 IO::Uncompress::Base::rea +d 0.10 0.158 0.158 14585 0.0000 0.0000 IO::Uncompress::Base::sav +eStatus 0.10 0.151 0.151 7286 0.0000 0.0000 IO::Uncompress::Base::pus +hBack 0.09 0.134 0.134 7288 0.0000 0.0000 IO::Uncompress::Base::sma +rtEof 0.08 0.128 0.128 14572 0.0000 0.0000 U64::add 0.04 0.060 0.060 21864 0.0000 0.0000 Compress::Raw::Zlib::__AN +ON__ 0.02 0.025 154.34 1 0.0248 154.34 IO::Uncompress::Base::_rd +2 0.01 0.016 0.016 4 0.0040 0.0040 Compress::Raw::Zlib::cons +tant 0.01 0.016 0.016 8 0.0020 0.0020 Exporter::export 0.01 0.016 0.016 7 0.0023 0.0023 IO::File::BEGIN 0.01 0.016 0.016 26 0.0006 0.0006 Compress::Raw::Zlib::BEGI +N

UPDATE2:

I believe Corion and BrowserUk are correct about this issue.

Watching Pagefaults with Process Explorer, here is what I see:

~26,000,000 Pagefaults - Slow Method
~6,000 Pagefaults - Fast Method

UPDATE3:

I believe this talks about the root cause:
https://groups.google.com/forum/?fromgroups#!topic/perl.perl5.porters/44PTHwefYUk

Comment on IO::Uncompress::Gunzip to scalar takes hours (on windows)
Select or Download Code
Re: IO::Uncompress::Gunzip to scalar takes hours (on windows)
by ig (Vicar) on May 21, 2013 at 19:37 UTC

    Running Strawberry Perl v5.16.2 on Windows 7 64bit with 4GB RAM, unzipping a 290MB file took less than 15 seconds.

    While your system is running the slow gunzip, is your CPU busy? Is your disk (swap file) busy?

      The CPU is busy, but disk activity and swap is low.

        I would try upgrading perl and/or relevant modules, then the OS to solve or at least isolate the problem.

Re: IO::Uncompress::Gunzip to scalar takes hours (on windows)
by Corion (Pope) on May 21, 2013 at 19:44 UTC

    If this is ActiveState Perl built with VC6, I remember some bad (re)allocation behaviour of the runtime memory allocator. Maybe the decompressor reallocates the buffer quite often and triggers the bad behaviour.

    A potential workaround could be to presize the buffer:

    $cleartxt= " " x 100_000; # or whatever you expect; $cleartxt= ''; gunzip ...

    Super Search for a post by BrowserUk somewhere in such a thread could unearth more information.

      I tried this quickly last night, but it didn't seem to make any difference. I'm hoping to profile it to see what is really going on. If I can get to that soon, I'll post the results.

      This still may be the issue, but maybe I just didn't preallocate enough.

      Thanks ++Corion

Re: IO::Uncompress::Gunzip to scalar takes hours (on windows) which VERSION
by Anonymous Monk on May 21, 2013 at 21:32 UTC

    which VERSION? Upgrade?

Re: IO::Uncompress::Gunzip to scalar takes hours (on windows)
by BrowserUk (Pope) on May 22, 2013 at 05:06 UTC

    How big is the expanded file on disk?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      The expanded file comes to about 355M

        So, that shouldn't be taxing your memory too much.

        I couldn't reproduce your problem. I ran your program verbatim on my windows system running Vista64, (AS)5.10.1 and IO::Compress:2.060 and on files 65/210MB and 80/600MB and they both took roughly 5 seconds to disk and 3 seconds in memory.

        Does the same problem exhibit on all files when decompressed to memory or is it confined to one particular file?

        One possibility (mentioned by Corion) is indicate by excessive page faults for the process(*). If the process shows a page fault delta greater than double digits per second, you have probably encountered the malloc problem. But if that were the case, I would have expected to be able to reproduce it here on my standard AS install.

        (*You'll need Process Explorer or work out how to use perfmon.exe to find this information.)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1034568]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (11)
As of 2014-07-29 11:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (216 votes), past polls