note
VK
<p>>Thread drift is allowed. For good netiquette, also change the title in the reply form.<br>
OK then. So and first of all I am not a staff developer of Wikipedia, just one of volunteer editors. We needed a script for a set of users willing to get notifications about upcoming internal elections, acting like a daemon (checking every 24 hrs some place and notify if there is something).<br>
tools.wmflabs.org gives you anything of your choice (Perl, PHP, Python, C#, you name it) in latest stable versions. I don't like Python, have no idea about C#, remember something about Perl - so I did Perl.</p>
<p>This is to make it clear that the list=allusers query has nothing to do with the actual task. It is only to show the exact data format to query and to expect. The full MediaWiki API help is here: https://ru.wikipedia.org/w/api.php?action=help&uselang=en</p>
<p>Now... The script has to be able to handle Unicode/UTF-8/whatever literals in the code: so I needed use utf8; It also has to output it in HTML- so I needed binmode STDOUT, ':utf8';<br>
It also has to receive JSON, decode it, slice it, string compare/replace and all other thing - all with Cyrillic in them. I dropped all (en|de)coding things called in this thread unnecessary so came to:</p>
<pre>
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use Encode;
use LWP::UserAgent;
use HTTP::Request::Common;
use HTTP::Cookies;
use JSON;
my $browser = LWP::UserAgent->new;
# they ask to use descriptive user-agent - not LWP defaults
# w:ru:User:Bot_of_the_Seven = https://ru.wikipedia.org/wiki/Участник:Bot_of_the_Seven
$browser->agent('w:ru:User:Bot_of_the_Seven (LWP like Gecko) We come in peace');
# I need cookies exchange enabled for auth
# here is doesn't matter but to give full LWP picture:
$browser->cookie_jar({});
# a very few queries can be done by GET - most of MediaWiki require POST
# so I do POST all around rather then remember where GET is allowed or not:
my $response = $browser->request(POST 'https://ru.wikipedia.org/w/api.php',
{
'format' => 'json',
'formatversion' => 2,
'errorformat' => 'bc',
'action' => 'query',
'list' => 'allusers',
'auactiveusers' => 1,
'aulimit' => 10,
'aufrom' => 'Б'
}
);
my $data = decode_json($response->content);
my $test_scalar = $data->{query}->{allusers}[0]->{name};
my @test_array = @{$data->{query}->{allusers}}[0..2];
display_html($test_array[1]->{name});
sub display_html {
my @html = (
'<!DOCTYPE html>',
'<html>',
'<head>',
'<meta charset="UTF-8">',
'<title>Мой тест</title>',
'</head>',
'<body>',
shift // 'Статус — ОК', # soft OR: 0 and empty string accepted
'</body>',
'</html>'
);
# to avoid "wide character" warnings:
binmode STDOUT, ':utf8';
print "Content-Type: text/html; charset=utf-8\n\n";
print join("\n", @html);
}
</pre>
<p>Is there anything that might go badly wrong concerning Cyrillic in Unicode/UTF-8?</p>
11105382
11105439