Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

LWP UserAgent to script Mediawiki reads

by Sabalon (Initiate)
on Feb 02, 2012 at 03:26 UTC ( #951337=perlquestion: print w/ replies, xml ) Need Help??
Sabalon has asked for the wisdom of the Perl Monks concerning the following question:

Hello all. I had a full head of hair before starting this. Now I feel like Kojack!

I am trying to get at some data inside a MediaWiki uh..wiki. I am wanting to make use of the render for print function of index.php to grab out some info and display it elsewhere.

That is why I am not using the MediaWiki::API calls...no way to have it formatted for you, templates, etc...

The HTML that comes back from my LWP User Agent keeps saying TheWiki uses cookies to login in users. You have cookies disabled. Please enable them and try again. LIES! :)

use HTTP::Request; use HTTP::Request::Common qw(GET POST); use HTTP::Cookies; use LWP::UserAgent; my $browser=new LWP::UserAgent; my $cookies=new HTTP::Cookies( file=>"/tmp/cookies.txt", autosave=>1, +ignore_discard=>1); $browser->cookie_jar($cookies); $req = POST 'http://localhost/wiki/index.php?title=Special:Userlogin&a +ction=submitlogin&wpName=readuser&wpPassword=letmein'; $response=$browser->request($req);
The response is 200, but the data it returns when I dump it out is the same login page with the cookie message.

If I remove the cookie file before running this, it recreates it, and it does put a cookie in there for the session (however I had to add ignore_discard to get it to do that.)

I have tried several variations - GET vs POST, putting the post variables as content to the request. I tried the mwpush.pl shown here: http://en.wikipedia.org/wiki/User:KeithTyler/mwpush.pl but it fails in the same way (in fact most of my code is ripped from that example since mine wasn't working)

Comment on LWP UserAgent to script Mediawiki reads
Download Code
Re: LWP UserAgent to script Mediawiki reads
by Anonymous Monk on Feb 02, 2012 at 04:34 UTC

    So MediaWiki::API works for logging in? Read the source

      Shall look, however I believe the login methods are different between the API and the general site.
Re: LWP UserAgent to script Mediawiki reads
by Anonymous Monk on Feb 02, 2012 at 04:40 UTC
    If mwpush.pl works, you'll notice the user/pass... are posted, not part of the url, so do what mwpush.pl does, it usually does it for a reason
      I had it posting the data originally, like mwpush. However not even the mwpush works. Had switched just to try. Thanks
Re: LWP UserAgent to script Mediawiki reads
by Khen1950fx (Canon) on Feb 02, 2012 at 05:58 UTC
    How about something like this?
    #!/usr/bin/perl -l use strict; use warnings; $| = 1; require IO::Socket; my $D = shift || ''; if ($D eq 'daemon') { require HTTP::Daemon; my $d = HTTP::Daemon->new(Timeout => 10); print "Pleased to meet you at: <URL:", $d->url, ">\n"; open STDOUT, '>', '/dev/null'; while (my $c = $d->accept) { my $r = $c->get_request; if ($r) { my $p = ($r->uri->path_segments)[1]; my $func = lc("httpd_" . $r->method . "_$p"); if (defined &$func) { &$func($c, $r); } else { $c->send_error(404); } } $c = undef; } print STDERR "HTTP Server terminated\n"; exit; } require URI; my $base = URI->new('http://192.168.1.1'); sub url { my $u = URI->new(@_); $u = $u->abs($_[1]) if @_ > 1; $u->as_string; } print "\tWill access HTTP server at $base"; require LWP::UserAgent; require HTTP::Request; require HTTP::Cookies; my $ua = new LWP::UserAgent; $ua->agent("Mozilla/0.01 " . $ua->agent); my $cookies = new HTTP::Cookies( file => '/tmp/cookies.txt', autosave => 1, ignore_discard => 1, ); $ua->cookie_jar($cookies); my $req = new HTTP::Request POST => url( 'http://192.168.1.1/wiki/index.php?title= Special:Userlogin&action=submitlogin&wpName= readuser&wpPassword=letmein', $base); my $res = $ua->request($req); print $res->as_string;

      And what is that supposed to be?

Re: LWP UserAgent to script Mediawiki reads
by PikMaster (Initiate) on Oct 30, 2012 at 14:25 UTC
    Hi.

    You need to do GET request first, to get the value of input field wpLoginToken.

    my $req; my $response; $req = "$wikiurl?title=Special:UserLogin&returnto=Main+Page"; print "req = GET $req\n"; $response = $ua->request( GET $req, ); # look for for cookie-set request from server, ie. mediawiki_isg_sess +ion=954d4797c18ca9054f14c2675af3255e; path=/; HttpOnly # foreach (keys %{$response->{'_headers'}}) { if ($_ =~ /^set-cookie$/i) { # server attempting to set a cookie my $c = $response->{'_headers'}->{$_}; print "Server header set-cookie = $c\n"; } } my @lines = split /\n/, $response->content(); foreach (@lines) { if (/name="wpLoginToken" value="([0-9a-z]+)"/) { my $token = $1; $params{'wpLoginToken'} = $token; print "found token=$token, line=$_\n"; } } $cookie_jar->extract_cookies( $response );

    Then, you need to pass this in your next request when you are actually logging in, when submitting the username and password

    $req = $wikiurl."?title=Special:UserLogin&action=submitlogin&type=lo +gin&returnto=Main+Page"; print "req = POST $req\n"; $params{'wpLoginAttempt'} = "Log in"; $response = $ua->request( POST $req, Content_Type => 'application/x-www-form-urlencoded' , Content => [ %params ] ); $loggedIn = 0; foreach (keys %{$response->{'_headers'}}) { # print "ServerHeader: ".$_."\n"; if ($_ =~ /^set-cookie$/i) { # server attempting to set a cookie my $a = $response->{'_headers'}->{$_}; if ($a =~ /^ARRAY(.+)$/) { foreach (@{$a}) { print "Server header set-cookie: ======== $_\n"; if (/UserID=\d+\;/i) { $loggedIn = 1; # Success! last; } } } } } print "Login result = $loggedIn\n";

    Now, you will need the get and keep passing the editToken

    my $LastEditToken = undef; my $body = $response->content(); if ($body =~ /value="([0-9a-z\+\\]+)"\s+name="editToken"/) { $LastEditToken = $1; print "found token=$LastEditToken\n"; }

    Whenever you want to do anything further with the wiki, keep pasing the editToken, eg. for importing XML dumps:

    sub importXML { if (not defined $LastEditToken) { my $response = $ua->request( GET "$wikiurl?title=Special:Import", ); if ($response->content() =~ /value="([0-9a-z\+\\]+)"\s+name="editT +oken"/) { $LastEditToken = $1; print "==found token=$LastEditToken\n"; } } my $url = "$wikiurl?title=Special:Import&action=submit"; print "Sending request to $url,\n using token $LastEditToken\n"; my $response = $ua->request( POST "$url", Content_Type => 'multipart/form-data', Content => [ 'action' => 'submit', 'source' => 'upload', 'editToken' => $LastEditToken, 'MAX_FILE_SIZE' => $MaxXmlSize, 'xmlimport' => [$filepath], ] ); if ($response->is_success) { if ($response->content() =~ /value="([0-9a-z\+\\]+)"\s+name="editT +oken"/) { $LastEditToken = $1; print "==found token=$LastEditToken\n"; } return 1; } else { return 0; print $response->content(); } }

      The login part may be a little bit unclear, so here is the summary of form parameters passed (used in $ua->request):

      my %params = (); $params{'wpName'} = 'your_user_name'; $params{'wpPassword'} = 'your_password'; $params{'wpLoginAttempt'} = "Log in"; $params{'wpLoginToken'} = $token; #Login token obtained previously

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://951337]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (9)
As of 2014-09-20 14:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (159 votes), past polls