Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

IO::Socket::SSL is not downloading full data from HTTPS URL in Windows ActiveState Perl.

by sam_bakki (Pilgrim)
on Jan 21, 2015 at 11:10 UTC ( [id://1114014]=perlquestion: print w/replies, xml ) Need Help??

sam_bakki has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

While downloading data from HTTPS URL, I see different results in Net::SLL and IO::Socket::SSL. Basically, IO::Socket::SSL is not downloading full data.

To show whats really happening, I have two scripts below,

One uses the Net::SSL and downloads data properly from Server
Other uses IO::Socket::SSL and downloads only first chunk (I think) from server and quits.

To show the differences b/w downloads, I have shown MD5 sum and file sizes.

My environment
OS: Windows 7 , x86_64 bit Perl: Active Perl , perl 5, version 20, subversion 1 (v5.20.1) built for MSWin32-x86-multi-thread-64int

Note: I saw the same behavior in Active Perl 5.10, 5.14, 5.16 and 5.18

Script 1 - Using Net::SSL and Crypt::SSLeay - Working

#WORKING HTTPS DOWNLOAD Using Net::SSL in Windows + Active Perl use strict; use warnings; use Crypt::SSLeay; use Net::SSL; use WWW::Mechanize; use HTTP::Cookies; use HTTP::Message; use Digest::MD5; use File::Slurp; use Data::Dumper; use Devel::ModuleDumper; #Globals $|=1; #Force LWP to use Net::SSL instead of IO::Socket::SSL $ENV{PERL_NET_HTTPS_SSL_SOCKET_CLASS} = "Net::SSL"; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; delete $ENV{https_proxy} if exists $ENV{https_proxy}; delete $ENV{http_proxy} if exists $ENV{http_proxy}; #Variables my $browser = ""; my $url = 'https://developer.apple.com/standards/qtff-2001.pdf'; my $pageContent = ''; my $fileName = ''; my $md5Obj = Digest::MD5->new(); print "\n USING Net::SSL"; #Init Mechanize $browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1, ssl_opts => +{ 'verify_hostname' => 0 }); # Add cookie jar $browser->cookie_jar(HTTP::Cookies->new()); $browser->agent_alias( 'Linux Mozilla'); $browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl +e()); $browser->timeout(120); #Get URL $browser->get($url); if ($browser->success()) { print "\n INFO: Got URL: $url"; $fileName = $browser->response()->filename(); print "\n INFO: Save in File: $fileName"; $browser->save_content($fileName); #Calculate MD5 sum $pageContent = read_file( $fileName, binmode => ':raw' ); print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB" +; $md5Obj->add($pageContent); print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest(); undef $md5Obj; } else { print "\n ERROR: Can't get URL $url ",$browser->status(); } print "\n\n INFO: ********************* DUMP ********************"; print "\n",Dumper(\$browser); print "\n INFO: ********************* DUMP ********************"; exit 0;

Output1:


  USING Net::SSL
 INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf
 INFO: Save in File: qtff-2001.pdf
 INFO: qtff-2001.pdf Size: 5465.48046875 KB
 INFO: qtff-2001.pdf MD5 Sum: d1aee95cc06d529e67b707257a5cf3eb

Loaded Modules
-------------------
Carp	1.3301
Compress::Raw::Bzip2	2.068
Compress::Raw::Zlib	2.068
Compress::Zlib	2.068
Crypt::SSLeay	0.72
Crypt::SSLeay::CTX	none
Crypt::SSLeay::MainContext	none
Crypt::SSLeay::X509	none
Data::Dumper	2.154
Digest::base	1.16
Digest::MD5	2.53
Encode	2.67
Encode::Alias	2.18
Encode::Config	2.05
Encode::Encoding	2.07
Errno	1.2003
Exporter	5.70
Exporter::Heavy	5.70
Fcntl	1.11
File::Glob	1.23
File::GlobMapper	1.000
File::Slurp	9999.19
HTML::Entities	3.69
HTML::Form	6.03
HTML::Parser	3.71
HTML::PullParser	3.57
HTML::Tagset	3.20
HTML::TokeParser	3.69
HTTP::Config	6.00
HTTP::Cookies	6.01
HTTP::Cookies::Netscape	6.00
HTTP::Date	6.02
HTTP::Headers	6.05
HTTP::Headers::Util	6.03
HTTP::Message	6.06
HTTP::Request	6.00
HTTP::Request::Common	6.04
HTTP::Response	6.04
HTTP::Status	6.03
IO	1.31
IO::Compress::Adapter::Deflate	2.068
IO::Compress::Base	2.068
IO::Compress::Base::Common	2.068
IO::Compress::Gzip	2.068
IO::Compress::Gzip::Constants	2.068
IO::Compress::RawDeflate	2.068
IO::Compress::Zlib::Constants	2.068
IO::Compress::Zlib::Extra	2.068
IO::File	1.16
IO::Handle	1.35
IO::Seekable	1.1
IO::Socket	1.37
IO::Socket::INET	1.35
IO::Socket::IP	0.35
IO::Socket::UNIX	1.26
IO::Uncompress::Adapter::Bunzip2	2.068
IO::Uncompress::Adapter::Inflate	2.068
IO::Uncompress::Base	2.068
IO::Uncompress::Bunzip2	2.068
IO::Uncompress::Gunzip	2.068
IO::Uncompress::Inflate	2.068
IO::Uncompress::RawInflate	2.068
List::Util	1.41
LWP	6.08
LWP::MemberMixin	none
LWP::Protocol	6.06
LWP::Protocol::http	none
LWP::Protocol::https	6.06
LWP::UserAgent	6.06
MIME::Base64	3.14
Net::HTTP	6.07
Net::HTTP::Methods	6.07
Net::HTTPS	6.04
Net::SSL	2.86
POSIX	1.38_03
Scalar::Util	1.41
SelectSaver	1.02
Socket	2.016
Storable	2.51
Symbol	1.07
Tie::Hash	1.05
Time::Local	1.2300
URI	1.65
URI::Escape	3.31
URI::http	none
URI::https	none
URI::_generic	none
URI::_query	none
URI::_server	none
WWW::Mechanize	1.73

Script 2 - Using IO::Socket::SSL - Not Working. Only part of the PDF file is downloaded

#NOT WORKING HTTPS DOWNLOAD Using IO::Socket::SSL in Windows + Active +Perl use strict; use warnings; use IO::Socket::SSL; use WWW::Mechanize; use HTTP::Cookies; use HTTP::Message; use Digest::MD5; use File::Slurp; use Data::Dumper; use Devel::ModuleDumper; #Globals $|=1; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; #Variables my $browser = ""; my $url = 'https://developer.apple.com/standards/qtff-2001.pdf'; my $pageContent = ''; my $fileName = ''; my $md5Obj = Digest::MD5->new(); print "\n USING IO::Socket::SSL"; #Init Mechanize $browser = WWW::Mechanize->new(autocheck =>1, noproxy=>1,ssl_opts => { + 'verify_hostname' => 0 }); # Add cookie jar $browser->cookie_jar(HTTP::Cookies->new()); $browser->agent_alias( 'Linux Mozilla'); $browser->add_header('Accept-Encoding'=>scalar HTTP::Message::decodabl +e()); $browser->timeout(120); #Get URL $browser->get($url); if ($browser->success()) { print "\n INFO: Got URL: $url"; $fileName = $browser->response()->filename(); print "\n INFO: Save in File: $fileName"; $browser->save_content($fileName); #Calculate MD5 sum $pageContent = read_file( $fileName, binmode => ':raw' ); print "\n INFO: $fileName Size: ", length($pageContent)/1024," KB" +; $md5Obj->add($pageContent); print "\n INFO: $fileName MD5 Sum: ", $md5Obj->hexdigest(); undef $md5Obj; } else { print "\n ERROR: Can't get URL $url ",$browser->status(); } print "\n\n INFO: ********************* DUMP ********************"; print "\n",Dumper(\$browser); print "\n INFO: ********************* DUMP ********************"; exit 0;

Output2:


  USING IO::Socket::SSL
 INFO: Got URL: https://developer.apple.com/standards/qtff-2001.pdf
 INFO: Save in File: qtff-2001.pdf
 INFO: qtff-2001.pdf Size: 6.66796875 KB
 INFO: qtff-2001.pdf MD5 Sum: 4049c364f7332790c3abe548d6a4297c

Loaded Modules
----------------
ActivePerl::Config      none
Carp    1.3301
Compress::Raw::Bzip2    2.068
Compress::Raw::Zlib     2.068
Compress::Zlib  2.068
Cwd     3.48
Data::Dumper    2.154
Digest::base    1.16
Digest::MD5     2.53
Encode  2.67
Encode::Alias   2.18
Encode::Byte    2.04
Encode::Config  2.05
Encode::Encoding        2.07
Encode::Locale  1.03
Errno   1.2003
Exporter        5.70
Exporter::Heavy 5.70
Fcntl   1.11
File::Basename  2.85
File::Glob      1.23
File::GlobMapper        1.000
File::Slurp     9999.19
File::Spec      3.48
File::Spec::Unix        3.48
File::Spec::Win32       3.48
HTML::Entities  3.69
HTML::Form      6.03
HTML::Parser    3.71
HTML::PullParser        3.57
HTML::Tagset    3.20
HTML::TokeParser        3.69
HTTP::Config    6.00
HTTP::Cookies   6.01
HTTP::Cookies::Netscape 6.00
HTTP::Date      6.02
HTTP::Headers   6.05
HTTP::Headers::Util     6.03
HTTP::Message   6.06
HTTP::Request   6.00
HTTP::Request::Common   6.04
HTTP::Response  6.04
HTTP::Status    6.03
IO      1.31
IO::Compress::Adapter::Deflate  2.068
IO::Compress::Base      2.068
IO::Compress::Base::Common      2.068
IO::Compress::Gzip      2.068
IO::Compress::Gzip::Constants   2.068
IO::Compress::RawDeflate        2.068
IO::Compress::Zlib::Constants   2.068
IO::Compress::Zlib::Extra       2.068
IO::File        1.16
IO::Handle      1.35
IO::Seekable    1.1
IO::Socket      1.37
IO::Socket::INET        1.35
IO::Socket::IP  0.35
IO::Socket::SSL 2.010
IO::Socket::SSL::PublicSuffix   none
IO::Socket::UNIX        1.26
IO::Uncompress::Adapter::Bunzip2        2.068
IO::Uncompress::Adapter::Inflate        2.068
IO::Uncompress::Base    2.068
IO::Uncompress::Bunzip2 2.068
IO::Uncompress::Gunzip  2.068
IO::Uncompress::Inflate 2.068
IO::Uncompress::RawInflate      2.068
List::Util      1.41
LWP     6.08
LWP::MemberMixin        none
LWP::Protocol   6.06
LWP::Protocol::http     none
LWP::Protocol::https    6.06
LWP::UserAgent  6.06
Mozilla::CA     20141217
Net::HTTP       6.07
Net::HTTP::Methods      6.07
Net::HTTPS      6.04
Net::SSLeay     1.66
POSIX   1.38_03
Scalar::Util    1.41
SelectSaver     1.02
Socket  2.016
Socket6 0.25
Storable        2.51
Symbol  1.07
Tie::Hash       1.05
Time::Local     1.2300
URI     1.65
URI::Escape     3.31
URI::http       none
URI::https      none
URI::_generic   none
URI::_idna      none
URI::_punycode  1.65
URI::_query     none
URI::_server    none
Win32::API      0.79
Win32::API::Struct      0.65
Win32::API::Type        0.69
Win32::Console  0.10
WWW::Mechanize  1.73

I did not paste the Dumper output because its huge and not properly copied to browser because of the binary contents.

Q: Why IO::Socket::SSL is not downloading full data? What more should I need to do in Script 2.

Update: Added Module versions

Update1: I have tested the Script2 in Linux Fedora 21, x64, Perl 5.18, It's is working fine :). So this looks like only problem in Windows + ActiveState Perl :(

Thanks & Regards,
Bakkiaraj M
My Perl Gtk2 technology demo project - http://code.google.com/p/saaral-soft-search-spider/ , contributions are welcome.

  • Comment on IO::Socket::SSL is not downloading full data from HTTPS URL in Windows ActiveState Perl.
  • Select or Download Code

Replies are listed 'Best First'.
Re: IO::Socket::SSL is not downloading full data from HTTPS URL in Windows ActiveState Perl.
by Anonymous Monk on Jan 21, 2015 at 11:53 UTC

        As you have suggested, I have added module versions using evel::ModuleDumper

        Hmm, interesting :)

        First, I would try lwp-mirror and/or lwp-download and see if it works with IO::Socket::SSL

        Second, as you pretty much have the latest versions of the stuff, I would report the problem to IO::Socket::SSL maintainer, he should be able to suggest more diagnostic steps

        unless , you'd like to try an older version like I have that seems to work for me :D

Re: IO::Socket::SSL is not downloading full data from HTTPS URL in Windows ActiveState Perl.
by noxxi (Pilgrim) on Jan 22, 2015 at 21:15 UTC

    This might be related to timeout handling. Historically neither Crypt::SSLeay nor IO::Socket::SSL had support for non-blocking sockets on Windows, so timeout handling never worked. With 2.006 non-blocking support was added to IO::Socket::SSL. Unfortunately this needs to have proper support in LWP because in Windows you need to check for an error of EWOULDBLOCK and not EAGAIN (on UNIX they are both the same), see pull request #11.

    I would ask you to try the following things:

    • Try the patch given in pull request #11.
    • Or try with IO::Socket::SSL 2.005 or lower

      Hi noxxi

      I had already applied the patch from https://github.com/libwww-perl/net-http/pull/11. Before that the script2 was not even running, it failed with timeout. After patch applied, You can see the script2 completes ($browser->status is success) but not downloaded the data properly.

      I see the same behavior in Windows + Active Perl 5.14, 5.16, 5.18 and 5.20. I have been suffering this problem for past 3 years :(. I thought to find a root so i can use the default IO::Socket::SSL.

      Thanks & Regards,
      Bakkiaraj M
      My Perl Gtk2 technology demo project - http://code.google.com/p/saaral-soft-search-spider/ , contributions are welcome.

        I see the same behavior in Windows + Active Perl 5.14, 5.16, 5.18 and 5.20. I have been suffering this problem for past 3 years :(. I thought to find a root so i can use the default IO::Socket::SSL.

        Unfortunately I can not reproduce the problem. Neither with Strawberry Perl, nor with the latest ActiveState Perl. Did you also apply https://github.com/libwww-perl/libwww-perl/pull/66, which fixes the non-blocking handling on Windows for the libwww part?

Re: IO::Socket::SSL is not downloading full data from HTTPS URL in Windows ActiveState Perl.
by sundialsvc4 (Abbot) on Jan 22, 2015 at 01:32 UTC

    I am interested in this topic, but I have two questions:

    (1) Bakkiaraj, if one module did allow the entire file to be downloaded successfully, “do you [still] have a problem?”   Or, is your question now one of curiosity?

    (2) As to the Anonymous response (and the referenced, unfortunately also Anonymous, “other thread”), I do not see any sort of answer here.   For instance, “all right, if I should be looking at one type of version-number instead of another type of number, what number should I be looking for, and how does this have any bearing as to why one module might be working and another is not?

    I know that you can’t look at the server-logs of developer.apple.com (can you?), but I definitely would suggest looking in the system event-viewer (probably in Administrative Tools) to see if Windows logged anything of interest on your end.   I’ll bet that it probably did, and, if it did, please add some of that information as a comment to your post.

      (2) As to the Anonymous response (and the referenced, unfortunately also Anonymous, “other thread”), I do not see any sort of answer here. For instance, “all right, if I should be looking at one type of version-number instead of another type of number, what number should I be looking for, and how does this have any bearing as to why one module might be working and another is not?

      Of course , this just means you sundialsvc4 don't understand basic debugging -- comparing version numbers of working modules and non-working ones, is like the first step to figuring out where a problem might exist, or how to side step it completely

      I know that you can’t look at the server-logs of developer.apple.com (can you?), but I definitely would suggest looking in the system event-viewer (probably in Administrative Tools) to see if Windows logged anything of interest on your end. I’ll bet that it probably did, and, if it did, please add some of that information as a comment to your post.

      This just means you sundialsvc4 have never used event-viewer

      Hi sundialsvc4

      (1) Bakkiaraj, if one module did allow the entire file to be downloaded successfully, “do you still have a problem?” Or, is your question now one of curiosity?

      I can use Crypt::SSLeay but I want to use default IO::Socket::SSL, I do not want to use extra modules.

      Thanks & Regards,
      Bakkiaraj M
      My Perl Gtk2 technology demo project - http://code.google.com/p/saaral-soft-search-spider/ , contributions are welcome.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1114014]
Front-paged by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-19 07:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found