Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Downloading large Binary files from https protocol using WWW::Mechanize in Windows OS

by sam_bakki (Pilgrim)
on Apr 19, 2012 at 11:34 UTC ( #965924=perlmeditation: print w/ replies, xml ) Need Help??

Hi

I attempt to download large files (~20MB+ zip, pdf) from a website (collabnet teamforge) which is using https protocol. I wrote a perl script using WWW::Mechanize and tried to save the file using 'save_content' function like below.

#initialise browser object $browser = WWW::Mechanize->new(autocheck =>1, noproxy =>1); # Add cookie jar $browser->cookie_jar(HTTP::Cookies->new()); $browser->agent_alias('Linux Mozilla'); .... $browser->get($collabnetArtifactURL); #Serach and follow the file / doc link in collabnet $tmpURL = $browser->find_link(tag=>'a',url_regex => qr/\/sf\/(docm +an|frs)\/do\/(downloadDocument|downloadFile)\/.*\/$artID/)->url_abs() +; $tmpURL.='/'.$collabnetFileVersion if ($collabnetFileVersion ne '' +); print "\n INFO: URL: $tmpURL"; #print "\n ", Dumper($response) , "\n"; $response = $browser->get($tmpURL); if ($browser->success()) { if ($collabnetFileName eq '') { $collabnetFileName = $response->filename() || $artID; } $browser->save_content($collabnetFileName); if (-s $collabnetFileName) { print "\n INFO: $collabnetFileName is downloaded"; } else { print "\n ERROR: $collabnetFileName is NOT downloaded"; } } ....

This is just a code snip set only not running code.

This code works find in Linux machine with perl 5.8.8 but failes to download files properly in Windows OS. I tried to decode_content and realise that only first chunk of data came and immediately client (perl script) sends x-die header. That means, In windows Active perl 5.12.x , downloading large binary files via https is not working.

The underlying problem was , perl lwp can not deflate the gzip data because , lwp dies before getting all the chunk of the big file. I searched a lot in google but did not find right solution. Because this appears to be Windows only problem.

Finally i made it working with using Crypt::SSLeay . By default in windows OS, LWP uses IO::Socket::SSL to handle https. which is not working for me. So i had to force LWP to use Net::SSL (provided by Crypt::SSLeay) using following code

use strict; use warnings; use Crypt::SSLeay; use Net::SSL; use WWW::Mechanize; use HTTP::Cookies; .... $ENV{PERL_NET_HTTPS_SSL_SOCKET_CLASS} = "Net::SSL"; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; ... # Same above code

That is it. Everything works fine after I added above code.

Regards,
Bakki
Perl technology demo project - http://code.google.com/p/saaral-soft-search-spider/

Comment on Downloading large Binary files from https protocol using WWW::Mechanize in Windows OS
Select or Download Code
Re: Downloading large Binary files from https protocol using WWW::Mechanize in Windows OS
by Anonymous Monk on Apr 19, 2012 at 12:42 UTC

      Hi There

      I think , it was not a issue of Certificate check , It is an issue of SSL handling. So intalling Mozilla::CA did not help me. The above solution only helped me.



      Thanks & Regards,
      Bakkiaraj M
      My Perl Gtk2 technology demo project - http://code.google.com/p/saaral-soft-search-spider/ , contributions are welcome.

        Sorry, that just doesn't make sense. I use windows OS and I download large https files at will. I'm not special.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://965924]
Approved by Corion
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (13)
As of 2015-07-02 22:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (45 votes), past polls