Dear Monks
I have a perl script to download data from HTTPS site. I was using Crypt::SSLeay. My script is working fine, I could properly download full data (csv file) from the server.
I thought of give a try with LWP's inbuilt IO::Socket::SSL.
Actually I am using WWW::Mechanize in my script, Script failed in
$mech->response()->decoded_content() phase. I tried to debug more and I found that it could not deflate the gzip compress data sent from server.
Surprised. I thought to debug more and disabled the compression using $mech->add_header('Accept-Encoding' => '');
Now, I could see the data comes from the server but its not complete data, i see only first few bytes. I examine the HTTP::Response headers, I find
'client-transfer-encoding' => [
'chunked'
]
Looks like the server is sending the chunked data to me. LWP / IO::Socket::SSL could not work with "chunked" data transfer. So gzip content decode fails.
when I force to use Crypt::SSLeay like below,
use Crypt::SSLeay;
use Net::SSL;
use WWW::Mechanize;
....
$ENV{PERL_NET_HTTPS_SSL_SOCKET_CLASS} = "Net::SSL";
$mech = WWW::Mechanize->new(autocheck =>1, noproxy =>1,ssl_opts => { '
+verify_hostname' => 0 });
...
I see full data comes to me from server. I still see "chunked" header but its properly handled by Net::SSL / Crypt::SSleay .
Q: Does any one face this issue? Perl LWP Can handle "Chunked" data transfer over SSL?. Thanks for your time.
Update: Added 2 test scripts to demonstrate the problem
One uses the Net::SSL and downloads data properly from Server
Other uses IO::Socket::SSL and downloads only first chunk (I think) from server and quits.
To show the differences b/w downloads, I have shown MD5 sum and file sizes.
My environment
OS: Windows 7 , x86_64 bit
Perl: Active Perl , perl 5, version 20, subversion 1 (v5.20.1) built for MSWin32-x86-multi-thread-64int
Note: I saw the same behavior in Active Perl 5.10, 5.14, 5.16 and 5.18
Script 1 - Using Net::SSL and Crypt::SSLeay - Working
#WORKING HTTPS DOWNLOAD Using Net::SSL in Windows + Active Perl
use strict;
use warnings;
use Crypt::SSLeay;
use Net::SSL;
use WWW::Mechanize;
use HTTP::Cookies;
use HTTP::Message;
use Digest::MD5;
use File::Slurp;
use Data::Dumper;
#Globals
$|=1;
#Force LWP to use Net::SSL instead of IO::Socket::SSL
$ENV{PERL_NET_HTTPS_SSL_SOCKET_CLASS} = "Net::SSL";
$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;
delete $ENV{https_proxy} if exists $ENV{https_proxy};
delete $ENV{http_proxy} if exists $ENV{http_proxy};
#Variables
my $browser = "";
my $url = 'https://developer.apple.com/standards/qtff-2001.pdf';
my $pageContent = '';
my $fileName = '';
my $md5Obj = Digest::MD5->new();
print "\n USING Net::SSL";
#Init Mechanize
$browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1, ssl_opts =>
+{ 'verify_hostname' => 0 });
# Add cookie jar
$browser->cookie_jar(HTTP::Cookies->new());
$browser->agent_alias( 'Linux Mozilla');
$browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl
+e());
$browser->timeout(120);
#Get URL
$browser->get($url);
if ($browser->success())
{
print "\n INFO: Got URL: $url";
$fileName = $browser->response()->filename();
print "\n INFO: Save in File: $fileName";
$browser->save_content($fileName);
#Calculate MD5 sum
$pageContent = read_file( $fileName, binmode => ':raw' );
print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB"
+;
$md5Obj->add($pageContent);
print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest();
undef $md5Obj;
}
else
{
print "\n ERROR: Can't get URL $url ",$browser->status();
}
print "\n\n INFO: ********************* DUMP ********************";
print "\n",Dumper(\$browser);
print "\n INFO: ********************* DUMP ********************";
exit 0;
Output1:
USING Net::SSL
INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf
INFO: Save in File: qtff-2001.pdf
INFO: qtff-2001.pdf Size: 5465.48046875 KB
INFO: qtff-2001.pdf MD5 Sum: d1aee95cc06d529e67b707257a5cf3eb
Script 2 - Using IO::Socket::SSL - Not Working. Only part of the PDF file is downloaded
#NOT WORKING HTTPS DOWNLOAD Using IO::Socket::SSL in Windows + Active
+Perl
use strict;
use warnings;
use IO::Socket::SSL;
use WWW::Mechanize;
use HTTP::Cookies;
use HTTP::Message;
use Digest::MD5;
use File::Slurp;
use Data::Dumper;
#Globals
$|=1;
$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;
#Variables
my $browser = "";
my $url = 'https://developer.apple.com/standards/qtff-2001.pdf';
my $pageContent = '';
my $fileName = '';
my $md5Obj = Digest::MD5->new();
print "\n USING IO::Socket::SSL";
#Init Mechanize
$browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1,ssl_opts => {
+ 'verify_hostname' => 0 });
# Add cookie jar
$browser->cookie_jar(HTTP::Cookies->new());
$browser->agent_alias( 'Linux Mozilla');
$browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl
+e());
$browser->timeout(120);
#Get URL
$browser->get($url);
if ($browser->success())
{
print "\n INFO: Got URL: $url";
$fileName = $browser->response()->filename();
print "\n INFO: Save in File: $fileName";
$browser->save_content($fileName);
#Calculate MD5 sum
$pageContent = read_file( $fileName, binmode => ':raw' );
print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB"
+;
$md5Obj->add($pageContent);
print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest();
undef $md5Obj;
}
else
{
print "\n ERROR: Can't get URL $url ",$browser->status();
}
print "\n\n INFO: ********************* DUMP ********************";
print "\n",Dumper(\$browser);
print "\n INFO: ********************* DUMP ********************";
exit 0;
Output2:
USING IO::Socket::SSL
INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf
INFO: Save in File: qtff-2001.pdf
INFO: qtff-2001.pdf Size: 6.66796875 KB
INFO: qtff-2001.pdf MD5 Sum: 4049c364f7332790c3abe548d6a4297c
I did not paste the Dumper output because its huge and not properly copied to browser because of the binary contents.
Please help me to understand why scripts behave differently? I was thinking, its chunking issues ...