Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
Just another Perl shrine
 
PerlMonks  

LWP posting

by PerlSufi (Pilgrim)
on May 14, 2013 at 18:25 UTC ( #1033530=perlquestion: print w/ replies, xml ) Need Help??
PerlSufi has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have been at this for a little while, and I am stuck on one part. I am trying to crawl a page to access a csv file. Here is my script (or part of it):
my $ua = LWP::UserAgent->new(ssl_opts => { verify_hostname => 1 }); $ua->agent('Mozilla/5.0'); $ua->cookie_jar({}); my $response = $ua->get('firstpage.com'); my $response2 = $ua->get('secondpage.com'); (etc) ###This section is where I am having trouble: my $post = $ua->post('https://www.foo.com/downloadAcsReportSearch.html +', "Search");

The problem I am having right now is that I am not clicking the "Search" button at the last post. Firebug shows it has the value 'Search'. I think I need help understanding how to do this. I also have to click one last button on the page proceeding that, but I'll cross that bridge when I get there. Any insight is greatly appreciated.Thanks!
PS-I should add that I cannot use a module that utilizes a browser or javascript plugin.

Comment on LWP posting
Download Code
Re: LWP posting
by aitap (Chaplain) on May 14, 2013 at 19:14 UTC
    Try WWW::Mechanize: it's easier to use when you need to emulate a browser and sequentally fetch pages from different forms, saving cookies and other session-related parameters between calls. It also contains a handy utility calles mech-dump which will help you to fill the form properly.
Re: LWP posting
by SnackySmorez (Initiate) on May 14, 2013 at 19:28 UTC

    Hi PerlSufi, try encoding the url. I seem to remember having to encode urls for some sites before submitting it. Also as the other poster replied, try Mechanize. If you have a choice use Mechanize.

      changing the crawler to use WWW::Mechanize for my sites gave me this error:
      Error GETing https://prodp1.usps.com/adminweb/view.htm?requestPage=P1D +ASHBOARD: Can't connect to prodp1.usps.com:443 (certificate verify failed) at ac +s_get.pl l ine 44.

      I think I may need to use LWP instead. Maybe with HTML::Form?
      And actually, I should mention that I use mechanize to login. Just haven't had much success navigating with it because of js stuff
      Hi, Thanks for your insight. Actually, the header does say it is form encoded.. but how do I do that?

        You can try forming your own url string and using url::encode. http://search.cpan.org/~chansen/URL-Encode-0.01/lib/URL/Encode.pod. Also worth checking out is uri::escape for %-encode and %-decode unsafe characters http://search.cpan.org/~gaas/URI-1.60/URI/Escape.pm

Re: LWP posting
by PerlSufi (Pilgrim) on May 15, 2013 at 00:35 UTC
    I got throught that part by adding
    use IO::Socket::SSL qw(); my $mech = WWW::Mechanize->new(ssl_opts => { SSL_verify_mode => IO::Socket::SSL::SSL_VERIFY_NONE, verify_hostname => 0,});
    So I'm not there yet but I'm further than I was.. Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1033530]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2014-04-20 10:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls