BinBerliner has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to write a client to grab data from . I have created a test account called "JohnSmith" with password "test". The first get_url() below works as expected but the second one always "fails". What do I mean with fail? The HTML returned includes the text "Da Sie zu lange inaktiv waren, wurde Ihre Sitzung beendet. Bitte laden Sie erneut die Startseite." This means, "Because you were inactive for too long then your sitting has been ended. Please load the start side again". This is my first attempt to write a robot... Any ideas what I'm doing wrong?
use strict; use LWP::UserAgent; my $hdrs = new HTTP::Headers(Accept => 'text/plain', User-Agent => 'IE +/5.0'); my $ua = new LWP::UserAgent; get_url(' +siUXoQo7uGcUZ6T6Hjs&page=loginp&nickname=JohnSmith&password=test&x=0& +y=0'); get_url(' +t&nickname=JohnSmith&?id=1-d24-fL6sAsiUXoQo7uGcUZ6T6Hjs&x=13&y=9'); sub get_url() { my $url2get = shift @_; my $url = new URI::URL($url2get); my $req = new HTTP::Request('GET', $url, $hdrs); my $resp = $ua->request($req); if ($resp->is_success) { printf "Good %s",$resp->content;} else { printf "Bad %s",$resp->message;} }
  • Comment on Problem with LWP, frames, login, and parent.main.location.replace()
  • Download Code

Replies are listed 'Best First'.
Re: Problem with LWP, frames, login, and parent.main.location.replace()
by merlyn (Sage) on Mar 02, 2001 at 22:41 UTC
    My guess is that there's some sort of state checker being used, such as a cookie or a hidden field, or if they're pretty dumb, the referer string.

    You'll probably need to note if any cookies are being sent back from the login request, or if there's a hidden field in the response form. Cookies can be managed nearly painlessly with HTTP::Cookies. Forms can be extracted from the response with HTML::Form.

    -- Randal L. Schwartz, Perl hacker

Solution for your problem with LWP, frames, login, and parent.main.location.replace()
by arhuman (Vicar) on Mar 02, 2001 at 23:32 UTC

    I've slightly modified (messed) your code and it works now...
    Or at least it seems to work,
    (My german doesn't allow me to be 100% affirmative)
    I've printed the last reached page for you can check it.

    Take note of the 'id' parameter passed on the url when you connect to the site.
    It's a 'session id', It changes with each connexion.
    when you use an 'id' that you got long before (and the password associated to this 'session id' the session is no longer valid, and you get the message telling 'you haven't been connected for too long' (Or something like that, my german isn't what it was supposed to be.))

    BTW your User Agent is not a valid one
    (something like :
    "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"
    is more common)
    and in this case is useless.

    I've leaved the debugging code (printing MATCH)
    to show you that the id value changes every time...

    the regex are ugly (please japhy don't hit me hard) but were just done to test it. In a real code you'll have to harden them...
    use LWP::UserAgent; my $hdrs = new HTTP::Headers('Accept' => 'text/plain', 'User-Agent' => + 'IE/5.0'); my $ua = new LWP::UserAgent; $page=get_url(''); if ($page=~ /content="\d+;url=([^"]+)/m) { print "\n\n*********MATCH($1)*******\n\n"; $page=get_url("$1"); } else { print "redirect not found !"; } if ($page=~ /location.href="([^"]+)"/s) { print "\n\n*********MATCH2($1)*******\n\n"; $page=get_url("$1&page=loginp&nickname=J +ohnSmith&password=test&x=0&y +=0"); } else { print "redirect not found !"; } print $page; if ($page=~ /main.location.replace\("([^"]+)"/s) { print "\n\n*********MATCH3($1)*******\n\n"; $page=get_url("$1&page=loginp&nickname=J +ohnSmith&password=test&x=0&y +=0"); } else { print "redirect not found !"; } print $page; sub get_url() { my $url2get = shift @_; my $url = new URI::URL($url2get); my $req = new HTTP::Request('GET', $url, $hdrs); my $resp = $ua->request($req); if ($resp->is_success) { return $resp->content;} else { return $resp->message;} }
      Okay. So now I extract the ID from a previous page containing javascript and paste it into the subsequent URLs that I want to get, and... it works! Looks like the ID does change for every session. Thanks for clearing this up guys. I know nothing about server side CGI development but by fiddling on the client side then slowly but surely I'm learning more about the sneaky things that the server side can get up to... :-)
Re: Problem with LWP, frames, login, and parent.main.location.replace()
by Masem (Monsignor) on Mar 02, 2001 at 23:01 UTC
    As merlyn points out, it's probably a state checker. looking at the long id parameter in both URLs, I would suspect that that's an obfuscated code that included the time when the code was generated; the server can determine if the time now and the time the code was generated was recent enough to allow the request. This is basically what I'm building into a site with some dynamic and restricted content to prevent bookmarking or linking to such content, but also to protect those users that might be using shared machines (so if a malicious user on that machine finds my site in the history logs, they won't be able to access the previous user's session unless they log in again).

    Probably in the case of this first script, being logged in may not be a necessity and thus the code is ignored.

      I thought about that. The code does seem to change depending upon the browser one uses but apparently not with time or any special tricks like that. I checked this by simply going with my regular browser to the URLs...