Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

HTTP::Headers error when submitting form via WWW::Mechanize

by jerrygarciuh (Curate)
on Jun 30, 2004 at 18:27 UTC ( [id://370861]=perlquestion: print w/replies, xml ) Need Help??

jerrygarciuh has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks,

I am attempting to learn WWW::Mechanize in order to avoid paying for API access from a large financial institution. If my app can access their web interface securely then the API is unneccessary. The modules in libwww-perl and HTTP::Headers etc are fresh from CPAN. My sample code:

#!/usr/local/bin/perl use strict; use Carp; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1 ); my $url = 'https://secure.secret.com/login/login.cfm?someParam=foo'; $mech->get( $url ); print $mech->uri() . "\nStep 1 done. \n"; $mech->submit_form( form_number => 0, fields => { login => 'xxxxx', password => 'xxxx' } ); print $mech->uri() . "\n\nDone.";

The small sample code above produces the following output on the command line:

https://secure.secret.com/login/login.cfm?someParam=foo
Step 1 done.
Can't locate object method "remove_content_headers" via package "HTTP::Headers" at (eval 14) line 1.

So the get() works on the https url but the object we grab with submit_form() throws an error. Does anyone understand why this should be the case? The source looks like this:

sub remove_content_headers { my $self = shift; unless (defined(wantarray)) { # fast branch that does not create return object delete @$self{grep $entity_header{$_} || /^content-/, keys %$self} +; return; } my $c = ref($self)->new; for my $f (grep $entity_header{$_} || /^content-/, keys %$self) { $c->{$f} = delete $self->{$f}; } $c; }
Thanks very much for any advice!!

jg
_____________________________________________________
"The man who grasps principles can successfully select his own methods.
The man who tries methods, ignoring principles, is sure to have trouble.
~ Ralph Waldo Emerson

Replies are listed 'Best First'.
Re: HTTP::Headers error when submitting form via WWW::Mechanize
by ratflyer (Acolyte) on Jun 30, 2004 at 21:51 UTC
    you may want to post at the libwww@perl.org mailing list. that's where people post questions about Mechanize To subscribe to the list, send a message to: <libwww-subscribe@perl.org>
Re: HTTP::Headers error when submitting form via WWW::Mechanize
by paulbort (Hermit) on Jun 30, 2004 at 21:23 UTC
    Before you go too much further, you might want to check the terms of service with your financial institution. They may have a clause that prohibits such things. (The usual reason for such a clause is to insure proper revenue for the API.)

    --
    Spring: Forces, Coiled Again!
      A clause that prohibits you from viewing a website using a browser, that's crazy talk ;)
Re: HTTP::Headers error when submitting form via WWW::Mechanize
by drewbie (Chaplain) on Jun 30, 2004 at 19:47 UTC
    You need to upgrade libwww-perl to the latest version (5.800). From the changelog:
    2004-04-07 Gisle Aas <gisle@ActiveState.com> Release 5.78 ... Added clear() and remove_content_headers() methods to HTTP::Headers.
      Thanks for the reply but as I mention in my post I have the latest versions and in fact included the source from Headers.pm for the method remove_content_headers() in my question.
      jg

      TIA
      jg
      _____________________________________________________
      "The man who grasps principles can successfully select his own methods.
      The man who tries methods, ignoring principles, is sure to have trouble.
      ~ Ralph Waldo Emerson
Re: HTTP::Headers error when submitting form via WWW::Mechanize
by skyknight (Hermit) on Jun 30, 2004 at 23:09 UTC
    This node should probably be resectioned to "Illegal". It sounds like you are almost certainly trying to do an end run around paying for a for-fee service.
      If the institution in question provides a web application (and they do) for me to run reports as a person with a browser, why on earth would it be illegal for me to access the service and do the same reports via an automaton? I am time impaired, so I am making a 'screen reader' which not only reads the reports but updates my database every quarter hour. As paulbort points out there may be a clause denying me this access but in my view I am merely automating the manner in which I already am allowed to avail myself of the reports.

      Peace,


      jg
      _____________________________________________________
      "The man who grasps principles can successfully select his own methods.
      The man who tries methods, ignoring principles, is sure to have trouble.
      ~ Ralph Waldo Emerson

        The medium via which information is presented is an integral component of any contract. For example, if a content provider provides you with content without a fee, then it probably has a revenue model that involves some secondary effect of you viewing the content, e.g. advertisement viewing. If you bypass their system and extract the information you want directly, in a way that violates your contract, then you are depriving them of their revenue. No matter how you slice it, taking something without authorization from someone is stealing. Furthermore, via your automation you may be harming them via resource over-taxing. There are very good reasons for prohibiting "robots" from accessing web sites, and your selfish disregard for them invites a Tragedy of the Commons.

        There is plenty of precedent for content/service providers going after people who violate the terms of their service agreement. Don't take my word for it though... Just ask some of the people who have been taken to court by eBay.

      Whoa, Nellie. It's not necessarily illegal. AFAIK, it would not be illegal to use WWW::Mechanize to pull a copy of the PM front page every five minutes. But if one has an agreement with a financial services institution, and that agreement prohibited any use of their web site other than access personally and immediately directed by the authorized user, this could be a violation of that agreement. They have a habit of writing terms into their agreements that are to their advantage, rather than their clients' advantage. Said institution could include in its terms such a prohibition, and the penalty the institution could levy for a violation, but unless there is a specific law being broken, we're talking about breach of contract, rather than criminal charges.

      I was suggesting caution, not a visit from the grim NodeReaper.

      --
      Spring: Forces, Coiled Again!
Re: HTTP::Headers error when submitting form via WWW::Mechanize
by neeraj (Scribe) on Jul 02, 2004 at 05:35 UTC
    Perhaps the page you are retrieving is compressed. Try out this code , i haven't checked it as i don't have internet access, but i hope it works.
    use strict; use warnings; use Compress::Zlib; use WWW::Mechanize; use WWW::Mechanize::FormFiller; # Initialize a new oblect my $mech = WWW::Mechanize->new(); # Fetch the given URL my $url = "https://secure.secret.com/login/login.cfm?someParam=foo"; $mech->get($url); # Throw error if page not found $mech->success or die "Can't fetch the Requested page"; # Uncompress the page my $dest = Compress::Zlib::memGunzip($mech->content); # Initialize a new object to retrieve the form my $f = WWW::Mechanize::FormFiller->new(); my $form = HTML::Form->parse($dest,$mech->base); # Fill the form with login & password and submit $f->fillout(user=>'xxx',password=>'xxx'); $f->fill_form($form); my $request = $form->click(); # Fetch the next page $mech->request($request); # Uncompress the page $dest = Compress::Zlib::memGunzip($mech->content); # Retrieve the form & check for string $form = HTML::Form->parse($dest,$mech->base); print "$form\n"; exit 0;
    I hope this works. Neeraj.
      Thx. neerage. I am afraid it doesn't solve problem. Appreciate the reply! --jg
      _____________________________________________________
      "The man who grasps principles can successfully select his own methods.
      The man who tries methods, ignoring principles, is sure to have trouble.
      ~ Ralph Waldo Emerson

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://370861]
Approved by tinita
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2024-03-19 08:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found