Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Can't able to get source page 500 error.

by sugu (Initiate)
on Jun 08, 2017 at 11:42 UTC ( [id://1192350]=perlquestion: print w/replies, xml ) Need Help??

sugu has asked for the wisdom of the Perl Monks concerning the following question:

I'm using Lwp::useragent and Mechanize for getting source page of websites but for this website(https://camelcamelcamel.com/) when runs the code it shows 500 error. I can't able to figure out where is my mistake. I don't know whether my mistake in cookie or useragent, can someone help me with this using Lwp::useragent itself...Thank you in Advance.

use strict; use LWP::UserAgent; use HTTP::Cookies; my $url = "https://camelcamelcamel.com/"; my $ua=LWP::UserAgent->new(); $ua->agent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/2010010 +1 Firefox/46.0"); my $cookie = HTTP::Cookies->new(file=>$0."_cookie.txt",autosave=>1); $ua->cookie_jar($cookie); my $req = HTTP::Request->new(GET=>"$url"); $req->header("Content-Type"=> "application/x-www-form-urlencoded"); $req->header("Accept"=> "text/html,application/xhtml+xml,application/x +ml;q=0.8,*/*;q=0.7"); my $res = $ua->request($req); $cookie->extract_cookies($res); $cookie->save; $cookie->add_cookie_header($req); $res->header("Content-Type"=> "application/xml; charset=utf-8"); my $code=$res->code; print "Code::$code\n";

Replies are listed 'Best First'.
Re: Can't able to get source page 500 error.
by 1nickt (Canon) on Jun 08, 2017 at 12:53 UTC

    Hi, I don't know what your code is all about: you mention Mech but the code does not show it. I also don't know about your 500 error: that might be LWP not supporting 405 which is the code the site returns.

    In any case, that page requires JavaScript to be enabled. WWW::Mechanize does not support JavaScript. Perhaps you can get around it by filling the CAPTCHA returned?

    Using something more modern and simple to see what's going on:

    use strict; use warnings; use feature 'say'; use Path::Tiny; use HTTP::Tiny; use HTTP::CookieJar; my $jar_file = Path::Tiny->tempfile; $jar_file->touch; my $jar = HTTP::CookieJar->new->load_cookies( $jar_file->lines ); my $ua = HTTP::Tiny->new( cookie_jar => $jar ); my $url = 'https://camelcamelcamel.com'; my $res = $ua->get( $url ); say $res->{'status'}; say $res->{'content'}; __END__
    Output:
    405 [ snip ] <p> As you were browsing <strong>camelcamelcamel.com</ +strong> something about your browser made us think you were a bot. Th +ere are a few reasons this might happen: </p> <ul> <li>You're a power user moving through this websit +e with super-human speed.</li> <li>You've disabled JavaScript in your web browser +.</li> <li>A third-party browser plugin, such as Ghostery + or NoScript, is preventing JavaScript from running. Additional infor +mation is available in this <a title='Third party browser plugins tha +t block javascript' href='http://ds.tl/help-third-party-plugins' targ +et='_blank'>support article</a>.</li> </ul> <p>After completing the CAPTCHA below, you will immedi +ately regain access to camelcamelcamel.com.</p> [ snip ]

    Hope this helps!


    The way forward always starts with a minimal test.

      I saying that if I running this code it shows 500 error and i didn't use mechanize in this script but even i use mechanize also i can't get the source page.Someone try to get source page for this site i have know how to get source page of this site.<\p>

        Er, did you actually read my reply?

        1. The page requires Javascript.
        2. Neither LWP::UserAgent nor WWW::Mechanize support Javascript.
        3. Thus, your approach can not work.
        If you want to try to accomplish your task in Perl, you could look at Selenium::Remote::Driver, but that would require installing a headless browser on your system. Or, you could choose to accomplish your task without Perl using PhantomJS. Either way, you have a lot of work and learning ahead of you.


        The way forward always starts with a minimal test.
Re: Can't able to get source page 500 error.
by mrguy123 (Hermit) on Jun 08, 2017 at 12:44 UTC
    I think your main problem is that you are trying to fetch a HTTPS page.
    If you change the URL to http://www.imdb.com (for example) you get code=200.
    Look here for more info: http://www.perlmonks.org/?node_id=888422.

    Good luck!
    Mr Guy
Re: Can't able to get source page 500 error.
by amitsq (Beadle) on Jun 08, 2017 at 13:39 UTC
    my guess: try to install this modul here, so you can access secure webpages http://search.cpan.org/~gaas/LWP-Protocol-https-6.04/lib/LWP/Protocol/https.pm hope it helps

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1192350]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (2)
As of 2024-04-26 04:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found