Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Can't get Mechanize to download file

by CJ7 (Novice)
on Mar 11, 2016 at 11:00 UTC ( #1157412=perlquestion: print w/replies, xml ) Need Help??

CJ7 has asked for the wisdom of the Perl Monks concerning the following question:

I have the following code:
use WWW::Mechanize; $url = "http://daccess-ods.un.org/access.nsf/GetOpen&DS=A/HRC/WGAD/201 +5/28&Lang=E"; $mech = WWW::Mechanize->new(); $mech->get($url); $content = $mech->content(); while ($content =~ m/<META HTTP-EQUIV="refresh" CONTENT="(\d+); URL=(. ++?)">/) { $refresh = $1; $link = $2; sleep $refresh; $mech->get($link); $content = $mech->content(); } $mech->save_content("output.txt");
When I put the URL assigned to $url in a browser the end result is the downloading of a PDF file, but when I run the above code I end up with HTML telling me I've accessed the site through unauthorised means. I think maybe Mechanize is not able to handle cookies properly. How can I get this to work?

Replies are listed 'Best First'.
Re: Can't get Mechanize to download file
by Anonymous Monk on Mar 11, 2016 at 11:43 UTC

    I think maybe Mechanize is not able to handle cookies properly.

    Nope :)

    How can I get this to work?

    See WWW::Mechanize::FAQ

    Does it work for you from the browser? You can http://WireShark.org or LiveHTTPHeaders to figure out what works from the browser ... then get mechanize to do the same ... HTTP, thats the way it goes

      If it handles cookies correctly as you say then why doesn't it work?

      As I said in my question, it does work in my browser.

      When I run Wireshark I can see it getting a cookie but I don't know how to emulate this in Mechanize.

        If it handles cookies correctly as you say then why doesn't it work?

        Because of all the reasons explained in the FAQ , but basically,

        because websites still discriminate based on browser identification headers

        cookies are often set through javascript or web bugs, and mech doesn't javascript or load/GET images, you have to help it do those things

        When I run Wireshark I can see it getting a cookie but I don't know how to emulate this in Mechanize.

        Have you checked to see if mechanize is already getting this cookie?

        Have you checked to see if mechanize is getting that same url?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1157412]
Approved by GotToBTru
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (3)
As of 2022-07-07 17:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?