Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Error accessing MediaWiki API

by bobf (Monsignor)
on Mar 28, 2009 at 20:08 UTC ( [id://753902]=note: print w/replies, xml ) Need Help??


in reply to Error accessing MediaWiki API

OK, here are more pieces to the puzzle. The issue is not yet resolved so I would still appreciate input on this.

I did more searches through MediaWiki docs and ultimately ended up on the MediaWiki IRC channel (http://www.mediawiki.org/wiki/MediaWiki_on_IRC). With the help from those guys, I found the following:

  • The query shown in the test code in the parent node (my $titles = $mw->api(...)) works. It was verified against two test sites:
    # $mw->{config}->{api_url} = 'http://test.wikipedia.org/w/api.php'; # $mw->{config}->{api_url} = 'https://secure.wikimedia.org/wikipedia/t +est/w/api.php';
    Therefore, I don't think the issue is with either the module or the API call.
  • As mentioned in the parent node, the query URL works fine when accessed from a browser (specifically, Firefox 3.0.5).
  • I compared the headers from the browser (captured via Live HTTP Headers) with those in the $mw object (via Data::Dumper). Other than more detail provided in the output by Data::Dumper related to the ssl certificate/etc they looked equivalent. The only difference that stuck out to my eyes was that the perl code used a POST method while the browser used GET.
  • After examining the output of Dumper( $mw ) I noticed that while the HTTP::Response object contained only the 403 error shown in the parent node (and the stack trace contained no new information), the content of the returned page was not null and may be significant:
    '_content' => '<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://w +ww.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en" +> <head> <title>Access forbidden!</title> <link rev="made" href="mailto:you@example.com" /> <style type="text/css"><!--/*--><![CDATA[/*><!--*/ body { color: #000000; background-color: #FFFFFF; } a:link { color: #0000CC; } p, address {margin-left: 3em;} span {font-size: smaller;} /*]]>*/--></style> </head> <body> <h1>Access forbidden!</h1> <p> You don\'t have permission to access the requested obj +ect. It is either read-protected or not readable by the ser +ver. </p> <p> If you think this is a server error, please contact the <a href="mailto:you@example.com">webmaster</a>. </p> <h2>Error 403</h2> <address> <a href="/">cabig-kc.nci.nih.gov</a><br /> <span>Sat Mar 28 02:18:28 2009<br /> Apache</span> </address> </body> </html>

My conclusion is that despite setting the agent to 'Mozilla/5.0' the program is still not acting enough like a browser. My naive assessment is that the server is rejecting the request because it looks too much like a bot, but the functionality is available because the same request from a browser works.

So my question becomes: How do I make the program look more like a browser? Did I miss something in the headers? I can post more information if requested, but I don't know what to look for.

My dear monks, what am I missing?

Thanks

Replies are listed 'Best First'.
Re^2: Error accessing MediaWiki API
by Anonymous Monk on Mar 29, 2009 at 07:25 UTC
    The headers may have been equivalent, but they weren't identical. The missing piece is the Accept header.

    rfc2616 says: If no Accept header field is present, then it is assumed that the client accepts all media types.

    It will work if you add headers to say that explicitly

    $mw->{ua}->default_header('Accept' => "*/*");
    This is clearly a bug with that webserver, it doesn't implement HTTP/1.x as it claims.

      Bingo! That did the trick. Thank you very much for your insight. You made my day. :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://753902]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-24 13:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found