Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

WWW::Mechanize::GZip and post() Issue

by rotneyjacob (Initiate)
on Aug 07, 2016 at 17:01 UTC ( [id://1169295]=perlquestion: print w/replies, xml ) Need Help??

rotneyjacob has asked for the wisdom of the Perl Monks concerning the following question:

I want to login into a website and download a report. So i tried using WWW::Mechanize::GZip and post() . I am able to sucessfully login to the website but i'm unable to go to the report page and hit the generate button in that page. The page seems to be using json so i receive a message as
<html> <head> <title>Error</title> <link type="text/css" rel="stylesheet" href="//cakecdn.com/portals/150 +205/resources/css/global.css" /> </head> <body> <div id="loading" style="left: 40%;"> <div class="loading-indicator" style="overflow: hidden;"> <img src="//cakecdn.com/portals/150205/images/24.gif" +alt="" /> <span id="loading-msg"><span style="color: #9F0000; fo +nt-size: 14px;">Your system is temporarily unavailable.<br />Please g +o back and try again..</span></span> </div> </div> </body> </html>
But manually its downloading. Someone help me in this.

Replies are listed 'Best First'.
Re: WWW::Mechanize::GZip and post() Issue
by Corion (Patriarch) on Aug 07, 2016 at 17:18 UTC

    Does the site work when you use the "normal" WWW::Mechanize instead?

    If not, most likely, the difference is somewhere between the headers that your webbrowser sends and the headers that your script sends. Compare the two using LWP::Debug and the Mozilla HTTP Live Headers extension for example.

      When i manually try the website works fine. I am new to perl can you please be more clear about this "Compare the two using LWP::Debug and the Mozilla HTTP Live Headers extension for example."

        Let me repeat myself:

        Does the site work when you use the "normal" WWW::Mechanize instead?

        By this, I mean to remove the mentions of WWW::Mechanize::GZip and replace them by WWW::Mechanize.

        WWW::Mechanize (and likely, WWW::Mechanize::GZip) are derived from LWP::UserAgent and thus the approach outlined in LWP::Debug will work for them to dump the outgoing HTTP requests and the responses you get.

        This is one side of the comparison. The other side, you can get for example by using the Mozilla HTTP Live Headers extension.

        Compare the output of your program (resp. the dumped headers) with the headers you capture when manually navigating the site. Change your script until there are no more differences.

Re: WWW::Mechanize::GZip and post() Issue
by Gangabass (Vicar) on Aug 08, 2016 at 05:14 UTC
    Most of the times it's some HTTP Header missing: Content-Type, X-Requested-With or Referer. Just find what headers were send by your webbrowser and repeat same headers in your code.

      As you said i tried replacing WWW::Mechanize::GZip as WWW::Mechanize but it gave me a result like this.

      `I%&/m{JJt`$ؐ@iG#)*eVe]f@흼{{;N'?\f +dlJɞ!?~|?"ǿ|<oo< + ?ۢ-Ӻw,ozQ›kN›棴> +j2oy~ޝfol9V‹n{go:ou=QVӯw]Ƥ]笸L +‹gU6+2#jcnZfMC_Ib9+Y[2Q:/f|)bq6 +‹EvAx/Ҭl?H&*[n/‹c}ZU(v9Lϫe?ȥwCI› +iѤPꢼN2+lRǓ}YYU:ɦol9K:.b9?‹.~q? ?-

      This is the what i received in Live HTTP Header. Can you please edit my code accordingly

      http://domain.com/affiliates/Extjs.ashx?s=subaffsummary POST /affiliates/Extjs.ashx?s=subaffsummary HTTP/1.1 Host: domain.com User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/2010010 +1 Firefox/47.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0. +8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate X-Requested-With: XMLHttpRequest Content-Type: application/x-www-form-urlencoded; charset=UTF-8 Referer: http://domain.com/affiliates/ Content-Length: 217 Cookie: .lat=1470690366019; .auth8=85946C2BF28A9B5023E9740FE20651D6CBF +ECE8C58B5FF6FDA19AA91CD5A4467FE99666B4CB729B552CA95F52A34748E7922F22C +A125024B788B49BF3E1C3170ADFEBE7D7B3512504E93ABDC3ADEEFF8EB279A1EA1EF9 +2D7F96C485D6C0037EAFB98E916F340A5ECF9DF1617B6B92DB109FC18B7A97D314344 +6D40E7443255C06C3A0CEFA0FE6448840B8E638C24512A1EAEE2CF43DD7106A05A75A +1F6BA6A0A1526A9E5E7C097804AB5D9C381EAB78FFB188CB954C92E1AE16DE120C83E +F4B0BC790AE0A86EC1F243C995A48DCA9901D41928C445E14EDBFEAFDCB7CFA1B4ADF +665BFD89ECB97911F9E498193A750AF Connection: keep-alive groupBy=&groupDir=ASC&o=sub_id&d=DESC&report_view_id=145&report_id=95& +report_views=Default&exclude_bot_traffic=0&date_range=today&start_dat +e=8%2F09%2F2016&end_date=8%2F09%2F2016&n=30&enter_subaffiliate_id=&ti +mezone=ET HTTP/1.1 200 OK Cache-Control: private Content-Type: application/json; charset=utf-8 Server: Microsoft-IIS/7.5 X-AspNet-Version: 4.0.30319 x-powered-by: ASP.NET Date: Mon, 08 Aug 2016 21:06:29 GMT Content-Length: 333

        What headers does your script send?

        Have you looked at the documentation, which I previously linked already? I'll link it again, for your convenience, here. Maybe this time you not only click it but copy the code shown there and put it to work in your program?

        Could the result you got back from the server you pasted in your first paragraph be the gzipped document you asked to download?

        That server response looks much better that the "system unavailable" you got before trying Corion's suggestion.

        Also use these setup in order to eliminate issues of rejection due to bad UAgent String and cookies turned off:

        1) Use appropriate "user agent string" e.g.
        my $mech = WWW::Mechanize->new( agent => 'Mozilla/5.0 (Windows NT 6.1; + WOW64; rv:47.0) Gecko/2010010 +1 Firefox/47.0' );
        2) Use a cookie jar in order to save all the cookies for entire session leading to the file download,
        my $mech = WWW::Mechanize->new( cookie_jar => {} );

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1169295]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (8)
As of 2024-04-25 11:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found