Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

problem with WWW::Mech

by morgon (Curate)
on Feb 11, 2017 at 20:37 UTC ( #1181752=perlquestion: print w/replies, xml ) Need Help??
morgon has asked for the wisdom of the Perl Monks concerning the following question:


all of a sudden I have a strange problem with WWW::Mechanize that I've boiled down to this:

use WWW::Mechanize(); my $mech = WWW::Mechanize->new; $mech->agent_alias( 'Linux Mozilla' ); my $url = " +week/print"; $mech->get($url, ":content_file" => "output.html");
What happens is that the output file contains only part of the expected content (a fragment of an html-document that looks as if it is truncated somewhere).

wget has no problems to download the url correctly.

What could be the issue here?

Many thanks!

Replies are listed 'Best First'.
Re: problem with WWW::Mech
by LanX (Bishop) on Feb 11, 2017 at 21:02 UTC
    Are you aware that the economist wants to be payed after the 3rd article?

    Otherwise only the first 2 or 3 paragraphs are shown.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!


      I am interested in understanding why it works without problem with wget (as many times as you want) but does not work with WWW::Mech.

        The server runs heuristics° to identify clients and count the number of already seen articles. (IP, user agent,...)

        And something in the mech request looks too strange compared to wget.

        Without /print version you should see something like

        "You have reached your article limit"

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

        update @2017-02-12 09:56 GMT

        °) Webserver heuristic user session identification

Re: problem with WWW::Mech
by morgon (Curate) on Feb 11, 2017 at 21:21 UTC
    When I dump the response-object that gets returned from the call to "get" I can see this line:
    'x-died' => 'Illegal field name \'X-Meta-Article:publisher\' at /home/ +mh/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/x86_64-linux +/HTML/ line 207.',
    I think the x-died header is inserted when a die occurs somewhere, which would maybe explain why I don't see the full content.

    Is there a way to hack around this?


      I've updated the HTML::HeadParser and all is fine again.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1181752]
Front-paged by Corion
[Corion]: Hmm. I think overnight I decided on simplifying some code. I have plugin classes that do data import (.csv, .yml, .json) and for that create objects on which then ->load() is called. But YAML::XS doesn't have an object, so I wrote my own wrapper.
[Corion]: This evening, I'll kill that wrapper again, and just call LoadFile() in the plugin class directly instead of creating a go-between object for no real gain.
[Corion]: Writing these import plugins was really nice though - in about 2 hours, I had imports for CSV, YAML and JSON, and adding XLS(X), SQLite (or DBI) data sources is also trivial. I'm idly wondering about separating the plugin into transport+parser, so ...
[Corion]: ... http:// URLs could be retrieved and then parsed, but I think that that would be overkill for a toy static site generator ;)

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2018-05-22 11:01 GMT
Find Nodes?
    Voting Booth?