Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

problem with WWW::Mech

by morgon (Curate)
on Feb 11, 2017 at 20:37 UTC ( #1181752=perlquestion: print w/replies, xml ) Need Help??
morgon has asked for the wisdom of the Perl Monks concerning the following question:

Hi

all of a sudden I have a strange problem with WWW::Mechanize that I've boiled down to this:

use WWW::Mechanize(); my $mech = WWW::Mechanize->new; $mech->agent_alias( 'Linux Mozilla' ); my $url = "http://www.economist.com/news/world-week/21716670-politics- +week/print"; $mech->get($url, ":content_file" => "output.html");
What happens is that the output file contains only part of the expected content (a fragment of an html-document that looks as if it is truncated somewhere).

wget has no problems to download the url correctly.

What could be the issue here?

Many thanks!

Replies are listed 'Best First'.
Re: problem with WWW::Mech
by LanX (Bishop) on Feb 11, 2017 at 21:02 UTC
    Are you aware that the economist wants to be payed after the 3rd article?

    Otherwise only the first 2 or 3 paragraphs are shown.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      Whatever.

      I am interested in understanding why it works without problem with wget (as many times as you want) but does not work with WWW::Mech.

        The server runs heuristics° to identify clients and count the number of already seen articles. (IP, user agent,...)

        And something in the mech request looks too strange compared to wget.

        Without /print version you should see something like

        "You have reached your article limit"

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

        update @2017-02-12 09:56 GMT

        °) Webserver heuristic user session identification

Re: problem with WWW::Mech
by morgon (Curate) on Feb 11, 2017 at 21:21 UTC
    When I dump the response-object that gets returned from the call to "get" I can see this line:
    'x-died' => 'Illegal field name \'X-Meta-Article:publisher\' at /home/ +mh/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/x86_64-linux +/HTML/HeadParser.pm line 207.',
    I think the x-died header is inserted when a die occurs somewhere, which would maybe explain why I don't see the full content.

    Is there a way to hack around this?

      Ok.

      I've updated the HTML::HeadParser and all is fine again.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1181752]
Front-paged by Corion
help
Chatterbox?
and the rats come out to play...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2017-09-21 11:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    During the recent solar eclipse, I:









    Results (245 votes). Check out past polls.

    Notices?