Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Perl mechanize get Error!

by Anonymous Monk
on Dec 02, 2013 at 17:19 UTC ( #1065308=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

Greetings Monks,

Below is my code, dont know why it is not working.

use strict; use WWW::Mechanize; my $url = "http://www.truro-penwith.ac.uk/"; my $mech = WWW::Mechanize->new(); print "\nURL: $url ...\n"; eval{ $mech->agent_alias('Windows Mozilla'); #$mech->add_header('User-Agent'=>'Mozilla/5.0 (Windows NT 6.1; WOW64; +rv:25.0) Gecko/20100101 Firefox/25.0'); #$mech->add_header('Accept'=>'text/html,application/xhtml+xml,applicat +ion/xml;q=0.9,*/*;q=0.8'); #$mech->add_header('Accept-Language'=>'en-US,en;q=0.5'); #$mech->add_header('Accept-Encoding'=>'gzip, deflate'); #$mech->add_header('Cookie'=>'bb2_screener_=1385998863+111.92.64.106; +PHPSESSID=078fc31740655a3a3f5fb280dbdf335d'); $mech->add_header('Connection'=>'keep-alive'); $mech->get($url); }; #$mech = $mech->content(); $mech = $mech->response->content(); print $mech; exit;

Anyone know what could be the proper reason.

Site is detecting this as a script, I tried adding headers with add_header & default_header, but nothing works. Response shows 400 Error and sometimes 403 Error. I wonder why this happened even though I had given the headers. Any ideas, I don't :(

Thanks in advance

Comment on Perl mechanize get Error!
Download Code
Re: Perl mechanize get Error!
by Anonymous Monk on Dec 02, 2013 at 17:25 UTC
    They don't want you to scrape the university website. Solution, don't.
      what an idiot you are... :D???!!!
Re: Perl mechanize get Error!
by PerlSufi (Pilgrim) on Dec 02, 2013 at 20:32 UTC
    What is your goal with this script? I have written a brief tutorial on using mechanize that can be found here: WWW::Mechanize Basics
    If you need to do a lot of navigating on the site, I would recommend WWW::Mechanize::Firefox since it uses a lot of javascript. WWW::Mechanize and javascript don't get along too well. Also, try
    $mech->dump_text;
    I also recommend getting the firebug firefox extension and manually inspecting the page for each thing you want to access. For example, the url for 'Latest News' is http://www.truro-penwith.ac.uk/category/news/ which I determined by using the firebug extension..
    So to go there, just do
    $mech->get('http://www.truro-penwith.ac.uk/category/news/');
    UPDATE: Also, simply:
    my $mech = WWW::Mechanize->new(); $mech->get('http://www.truro-penwith.ac.uk/'); $mech->dump_text;
    worked for me.. you don't need to do anything with headers..
      Hi PerlSufi, You are great. Ok, Can you check this, https://thebigword-careers.irecruittotal.com/cac/SearchVacancy.aspx?EmploymentTypeID=0&Intranet=0 and give us a solution? Take it as a challenge. ;) Best Anonymous Monk
        I'm not really sure what the 'challenge' is? Do you want to be able to submit that form?
        use strict; use warnings; use WWW::Mechanize; #takes what vacancy to search as first argument on command line my $mech = WWW::Mechanize->new(); $mech->get("https://thebigword-careers.irecruittotal.com/cac/SearchVac +ancy.aspx?EmploymentTypeID=0&Intranet=0"); my $vacancy = $ARGV[0]; $mech->field( "ctl00$mvMintPP$ctl00$ContentPlaceHolder_Main$mvMintPP$ctl00$txbJobRef +", $vacancy); #(^^without plus sign occuring copied over) $mech->click_button(value => "Search Vacancies"); $mech->dump_text;
        ..might work..

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1065308]
Approved by Laurent_R
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (12)
As of 2014-09-17 15:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (89 votes), past polls