Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

In my browser, I get a search feature by holding down <ctrl> and pressing F at the same time :)

How does it work? It is a series of scripts that have to be called in the right order. It's a bit messy, because it was very much an exploratory process.

The first pass is a script that just keeps walking down the chain of snippet pages until it can't find any more links. When it is run the second time around, it walks down the pages until it encounters a link that it has already seen. It sleeps 15 seconds between fetching each page (Perl Monk HTML hackers would do well to take notice of that last point). Similarly, the process is cronned at 4:15 UTC as I figure that's a pretty quiet time for yoda (the machine perlmonks.org is running on).

A second script then kicks in which cleans up some yucky inconsistencies, like reaped nodes with no titles, and reformatting the data to make it easy to process afterwards. This was a bugger to get right. I first tried to do it all in the first script, but it turned out to be simpler to let the fetching script do as little as possible, just fetch and dump, and let another script do the cleaning. It's awkward to carry state around between HTML::Parser callbacks.

A third script then takes the cleansed file and loads it into different hashes, to print them out in various sorted orders all different, for the various HTML views. For instance, it's at this stage where I calculate the number of nodes a person has written, how many nodes written by monks whose nicks start with 't', which makes it easy to emit the correct rowspan attributes to get everything to line up.

A fourth script then walks through all the files generated by the third pass and encodes them as HTML. No, I didn't use any HTML-generating modules. Naughty me, I did it all by hand, serves me right for not paying attention to what modules jcwren has installed on the server. This script creates the files in a directory named /pmsinew under my document root, and then when it has finished, it names /pmsi to /pmsiold and /pmsinew to /pmsi, and then proceeds to unlink the /pmsiold directory. Which means it should be pretty hard to come across a half-constructed index.

It would be a euphemism to say that the code lacks elegance in certain places. I was more interested in hacking up something quickly than showing The Right Way to do things. I do promise to rewrite the scripts and maybe even drop in a comment here and there.

When I do, I'll post the link from the /pmsi/ homepage.

--
g r i n d e r

In reply to Re:x2 PMSI - Perl Monks Snippets Index by grinder
in thread PMSI - Perl Monks Snippets Index by grinder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others contemplating the Monastery: (3)
    As of 2014-09-24 04:07 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (245 votes), past polls