|Perl: the Markov chain saw|
Offline wikipedia using Perlby grondilu (Pilgrim)
|on Mar 08, 2012 at 13:49 UTC||Need Help??|
I'm finally happy with the code I wrote to browse wikipedia offline.
The most tricky part was to keep the database small. So I made a database with blocks of 256 articles. Each block is frozen using Storable and then compressed with Bzip2. Doing so, the created database is only about 15% larger than the original xml.bz2
I also use XML::Parser to parse wikipedia's database dump.
Here is the most difficult part: converting the XML database (see http://download.wikimedia.org) into a usable one:
I think it works pretty well, even if the rendering of the Text::Mediawiki module is a bit ugly for some pages. I need to take care of the references for instance. Still, it does the job, and it's much faster than on-line browsing.
I posted everything (including the CGI script) on my wikipedia userpage, as it also concerns wikipedia users:http://fr.wikipedia.org/wiki/Utilisateur:Grondilu/Offline_Wikipedia_Perl
EDIT. I also set up a github repo: https://github.com/grondilu/offline-wikipedia-perl