http://www.perlmonks.org?node_id=885780

LambethBoy has asked for the wisdom of the Perl Monks concerning the following question:

I would like to process a web page that contains JavaScript that is sat on a Sharepoint site. I can access the site fine using -
my $w = WWW::Scripter->new( keep_alive => 1); $w->use_plugin('JavaScript'); $w->credentials( "www.company.com:80", undef, 'DOMAIN\\user', 'passwor +d' ); $w->get($url); print $w->content(),"\n";
However it would appear that the "keep_alive" argument doesn't get passed properly as although the web page gets returned, any external JavaScript references are not fetched. I get this message (repeated for each referenced script) :
The keep_alive option must be enabled for NTLM authentication to work. + NTLM authentication aborted. couldn't get script http://www.company.com/_layouts/1033/init.js?rev=q +X%2BG3yl4pldKy9KbPLXf9w%3D%3D: 401 Unauthorized at /usr/local/lib/per +l5/vendor_perl/5.10.1/HTML/DOM.pm line 494
Has anyone got any idea how I can work around this?
Cheers,
LB.

Replies are listed 'Best First'.
Re: Using WWW::Scripter with NTLM authentication
by rowdog (Curate) on Feb 02, 2011 at 17:47 UTC
    my $w = WWW::Scripter->new( keep_alive => 1);

    WWW::Scripter is a subclass of WWW::Mechanize which is a subclass of LWP::UserAgent which is where you'll find the explaination of the keep_alive option. Basically, it's the number of objects to be stored in the LWP::ConnCache. You need a lot more than 1.

      Sorry, my example wasn't very helpful in that respect. Yes, you are right about it controlling the cache capacity. However, even when I set it to large values (100, 1000, 100000 etc) it makes no difference.

      After some digging around in the code for LWP::UserAgent (5.835) I found this

      sub clone { my $self = shift; my $copy = bless { %$self }, ref $self; # copy most fields delete $copy->{handlers};
          delete $copy->{conn_cache};
      # copy any plain arrays and hashes; known not to need recursive co +py for my $k (qw(proxy no_proxy requests_redirectable)) { next unless $copy->{$k}; if (ref($copy->{$k}) eq "ARRAY") { $copy->{$k} = [ @{$copy->{$k}} ]; } elsif (ref($copy->{$k}) eq "HASH") { $copy->{$k} = { %{$copy->{$k}} }; } } if ($self->{def_headers}) { $copy->{def_headers} = $self->{def_headers}->clone; } # re-enable standard handlers $copy->parse_head($self->parse_head); # no easy way to clone the cookie jar; so let's just remove it for + now $copy->cookie_jar(undef); $copy; }

      So it looks like the conn_cache data is not inherited when the UserAgent is cloned.

      If I comment this line out (delete $copy->{conn_cache};), my problems go away. Could a more enlightened member tell me whether or not this looks like a bug, or am I just barking mad and fiddling with something with terrible unintended consequences?

      Cheers
      LB

        I don't see where or why you're cloning the ua but it looks pretty easy to get the ConnCache from the original.

        my $ua2 = $ua->clone; $ua2->conn_cache($ua->conn_cache);