(jeffa) Re: Public export of Perl Monks database
by jeffa (Bishop) on Feb 21, 2003 at 17:22 UTC
|
You don't have to have direct access to the database to make
useful things such as statistics, new searches, bible, etc.
All you need is a script to fetch nodes (i recommend
fetching XML versions).
use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
use LWP::Simple;
our $URL = 'http://www.perlmonks.org/index.pl';
our $PATH = '/path/to/perlmonks/nodes';
for (0 ... 666666) {
my $node = get "$URL?node_id=$_&displaytype=xml";
my $xml = XMLin($node);
next if $xml->{title} =~ /Permission\s+Denied/i;
next if $xml->{title} =~ /Not\s+found/i;
open FH, '>', "$PATH/$_.xml" or die "can't write: $!";
print FH $node;
sleep 5; # play nice ;)
}
Very simple, could use some more work, but this will get the
job done. Just be sure and run it during the weekend or
other 'less busy' times. ;) I also have some code over at
Node XML to HTML that transforms the XML into HTML ... it's not
perfect either, but it's a start.
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
|
"What I am asking is if this is allowed"
Well ... it's not not allowed.
"...this would generate quite some load on the
server..."
Damn spiffy it will. See up there in my post where i said
"run it during the weekend or other 'less busy' times"?
However, due to the fact that the code only fetches each
node as XML, it's not quite as much of a load as you might
think. The server does not have to generate nodelets and
such.
"I believe that when it is done my way..."
And that's why i posted. You might be waiting a loooong
time for your idea to be implemented here, unless you want
to become a god and do it yourself. :)
For the record, i would love to have access to the database.
From time to time i like to do a little history/research
and that would make my life much easier. Until then, i just
run a script similar to the one i cranked out above when
there are very few users on the site.
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [Watch: Dir/Any] |
|
Re: Public export of Perl Monks database
by VSarkiss (Monsignor) on Feb 21, 2003 at 16:04 UTC
|
I'm not sure exactly what you're proposing. Are you saying an export of the entire database, including code, home nodes, passwords, etc? I don't think that would be a good idea.
The Everything engine can import and export what are called "nodeballs". If you have a certain set of node_id's you want, gods have expressed willingness (in the context of pmdev) to create nodeballs of them. I can't speak for them, but they may be willing to do the same for you if you ask nicely. Of course, it would have to be a reasonable-sized set, with all the usual caveats about security, available time and resources, and so on.
Or am I misinterpreting your question entirely?
| [reply] [Watch: Dir/Any] |
|
I did say some restricted export - so no I don't mean to publish password etc. I was thinking about something like a let's say weekly automatic dump in a publicly available directory.
| [reply] [Watch: Dir/Any] |
Re: Public export of Perl Monks database
by pfaut (Priest) on Feb 21, 2003 at 18:53 UTC
|
I'm not sure exactly what information you want to get out of the system but quite a bit is available through the XML generators. I'm currently using these to create my own newest nodes interface (login version, no login version). Part of this project is to keep a local cache of node header information in a PostgreSQL database. You should be able to get at most of the information you want this way. Just don't beat on the server by asking for all 237,000 nodes at once and try to grab information during off peak hours.
---
print map { my ($m)=1<<hex($_)&11?' ':'';
$m.=substr('AHJPacehklnorstu',hex($_),1) }
split //,'2fde0abe76c36c914586c';
| [reply] [Watch: Dir/Any] [d/l] |
Re: Public export of Perl Monks database
by zby (Vicar) on Feb 21, 2003 at 17:07 UTC
|
| [reply] [Watch: Dir/Any] |
Re: Public export of Perl Monks database
by valdez (Monsignor) on Feb 21, 2003 at 17:28 UTC
|
Nice idea, zby++. Does someone know the rough size of such backup?
Ciao, Valerio
update: using data provided by jeffa, I did the following guess: given that tilly's nodes are ~1035 bytes, ~238000 nodes will be ~235Mb (uncompressed).
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: Public export of Perl Monks database
by blm (Hermit) on Feb 22, 2003 at 02:33 UTC
|
How big would the information be?
Consider that as of 2003-01-28 16:14:30 there were 21341 registered users of which 6232 have actually created write-ups. From this page we can calculate the total number of writeups as 190817.
Now Tilly has left (last login Mar 31, 2002 at 09:27 GMT-10) and jeffa has already noted that downloading tilly's nodes took up 3 MB on his hard drive. This was for 2986 posts. so the average node size was about 1053 bytes.
Assuming this average post size is representative of the entire perlmonks database one would estimate the size of the database containing writeups to be about 190817 x 1053 = 200930301 bytes or about 191 megabytes
Before anyone flames me know this: I know I have made some big assumptions. It should be noted that Tilly had a lot to offer so the size of his writeups would be larger then alot of others.
Most of my data came from the perlmonks stats site. A big thanks to jcwren! The total size of tilys writeups was from (jeffa) 2Re: Public export of Perl Monks database
Could anyone imagine a PerlMonks Compendium that could be sold to raise money for to fund the perlmonks web site? Would people be interested in that?
UPDATE: In the time that it took to write this several others have already posted this information
| [reply] [Watch: Dir/Any] |
Re: Public export of Perl Monks database
by castaway (Parson) on Feb 23, 2003 at 08:28 UTC
|
It's an interesting idea..
I'm just wondering how much actual work it would take to make it useful for anyone who's looking for answers to a certain problems.. i.e. someone will have to do quite a bit of sorting and categorising to make it suitable for any sort of publication. Most of the people that keep up Perl Monks (and hooray for them) seem to have enough to do already :)
(There's a whole lot of redudant stuff, posts that say the same, posts that are inacurate, because of misunderstanding the question etc. And who's to judge whats 'good' and 'useful' and whats not?)
Having said that, maybe the exported bundle of actual node data will be useful to someone.. Apart from setting up a mirror, I can't think of a real use at the moment. If anything, it'd be nice to put on a CD to make sure it doesn't get lost.. Though I hope that PM does backups anyway...
C. | [reply] [Watch: Dir/Any] |
|
Perhaps you don't see - but I do. I won't post the ideas here - they are still very vague, and I would like to test them befor that, but I am sure others will find other ideas. The important thing is to open the database so that everyone could test his own ideas. And I believe there are many ways to distill interesting information from it.
| [reply] [Watch: Dir/Any] |