Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

db content dump

by daxim (Chaplain)
on Aug 27, 2013 at 15:01 UTC ( #1051149=monkdiscuss: print w/ replies, xml ) Need Help??

tl;dr pmdev, where can one download Perlmonks, at least the content part of it? This is a prerequisite for volunteers to write the next generation Web interface.

Background: Corion told me in the chatterbox a while ago that the hurdle to publishing the source is that there are sec vulns lurking. He proposed that it needs some auditing first. I have the feeling that's never going to happen.

I realise that just forgoing goobs of code is extremely foolish, yet a clean slate is also a chance to execute on new features of which tye said earlier attempt did not follow through - no wonder, the hurdles to entry are just too high as is.

Even if my idea never bears fruits, a downloadable copy is still valuable for institutions like Archive Team or as a safe-guard against Pair, PM's generous host, turning neglectful or evil.

Comment on db content dump
Re: db content dump (tickers)
by tye (Cardinal) on Aug 27, 2013 at 15:27 UTC

    There used to be site documentation on "What tickers are available at PerlMonks?". I haven't been able to find it now. It documented the interfaces that can be used to download all of the public content of PerlMonks. I did find a translation of that documentation at Que geradores de XML estão atualmente disponíveis no PerlMonks?. Perhaps SiteDocClan can reconstruct it if necessary from that.

    - tye        

        Yes. Thanks.

        Sadly, the fact that that node is of type "superdoc" rather than "sitefaqlet" makes it much harder to find than it really should be (a surprisingly large number of accidents of implementation combine to contribute to that problem). I suspected that was part of my difficulty and so hoped to find a link to it from one of the sitefaqlets, but didn't find such either. Though, I now see that such a link is trying to hide in a tiny font at the top of the translation that I linked to.

        A new sitefaqlet on "downloading" that includes such a link might be a good idea. I'd also like to see PerlMonks Syndication expanded a bit (including that specific link) and just more "see also" cross-linking of sitefaqlets in general.

        It'd also be nice for super search to know how to search sitedoclets (something quite different from a sitefaqlet). But I doubt any member of pmdev will get around to that any time soon, for various reasonable reasons.

        - tye        

      At 1 request/s and ~1 million nodes that would take 12 days to crawl everything, also causing a huge traffic. Can you run this on the server and make a static dump/snapshot?

        The web site is not static. One static dump just begets another static dump.

        Do you have a successful replacement web site all ready?

        I'm pretty sure that there is more than 12 days of work yet ahead of you and that the vast majority of that work does not require ~1 million nodes of sample data.

        Having tried to produce dumps for export before and having seen others try, it is not simple, is not fast, and is prone to important mistakes. That it may appear very simple from where you are sitting does not actually have the power to change any of that.

        - tye        

Re: db content dump
by sundialsvc4 (Monsignor) on Aug 27, 2013 at 15:48 UTC

    I suggest that we try as much as possible to work by consensus ... if such a thing can be had ... because, even though there are a lot of “visual clods” in it (including silly-stuff, like the fact that the pictures in the top-of-screen rotation have no anti-aliasing thus “jaggies”), and some things that I wish would be changed (like Anonymous Monk and the content-evaluation system), there are many things about the site that do still work admirably well.   I would like for there to be continue to be o-n-egoto site about Perl,” and for this site to continue to be “it.”   Although the site is admittedly ugly, it is functional and it contains many years’s worth of a cornucopia of information.   That’s what’s important – vital – about PerlMonks, not so much its irritating warts and ugly looks.   A “fork” never quite turns out right despite best intentions.

      I suggest that ...

      I suggest you stop calling the site ugly. Not because it isn't ugly (it isn't), but because it's rude and enough is enough.

        True enough.   Point well taken, and no slight was ever intended.   On the one hand, the site shows its age ... by the mere fact that it hasn’t had a face-lift “in a coon’s age.”   But, on the other hand, it gets the job done as it always has.   At the end of the day, “results” are what we come here for.   Sure, I would love to see “pretty.”   But I do not come here for pretty things.   I would like to see (changes that I perceive to be) improvements.   But I fear changes that, though well-intentioned by someone else in the name of being “improvements,” compromise what I come here for.   Daresay I am not alone.   We all “know the drill,” and we have all pulled paying clients out of these same quagmires.

      like Anonymous Monk and the content-evaluation system...

      From what I have seen, there is no consensus on changing these things and if the votes on the nodes that suggest it are any indication I would suspect that the Monastery at best is divided on the topic if not overwhelmingly against changing it. However, it may be worth doing another vote on the topic as it is not likely to be determined in the forums.

      Sparky
      FMTEYEWTK

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://1051149]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2014-07-29 04:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls