problem with WWW::Mech

by morgon (Curate)
on Feb 11, 2017 at 20:37 UTC
morgon has asked for the wisdom of the Perl Monks concerning the following question:


all of a sudden I have a strange problem with WWW::Mechanize that I've boiled down to this:

use WWW::Mechanize(); my $mech = WWW::Mechanize->new; $mech->agent_alias( 'Linux Mozilla' ); my $url = " +week/print"; $mech->get($url, ":content_file" => "output.html");
What happens is that the output file contains only part of the expected content (a fragment of an html-document that looks as if it is truncated somewhere).

wget has no problems to download the url correctly.

What could be the issue here?

Many thanks!

Re: problem with WWW::Mech
by LanX (Bishop) on Feb 11, 2017 at 21:02 UTC
    Are you aware that the economist wants to be payed after the 3rd article?

    Otherwise only the first 2 or 3 paragraphs are shown.

      I am interested in understanding why it works without problem with wget (as many times as you want) but does not work with WWW::Mech.

        The server runs heuristics° to identify clients and count the number of already seen articles. (IP, user agent,...)

        And something in the mech request looks too strange compared to wget.

        Without /print version you should see something like

        "You have reached your article limit"

        update @2017-02-12 09:56 GMT

        °) Webserver heuristic user session identification

Re: problem with WWW::Mech
by morgon (Curate) on Feb 11, 2017 at 21:21 UTC
    When I dump the response-object that gets returned from the call to "get" I can see this line:
    'x-died' => 'Illegal field name \'X-Meta-Article:publisher\' at /home/ +mh/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/x86_64-linux +/HTML/ line 207.',
    I think the x-died header is inserted when a die occurs somewhere, which would maybe explain why I don't see the full content.

    Is there a way to hack around this?


      I've updated the HTML::HeadParser and all is fine again.

