Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

In my browser, I get a search feature by holding down <ctrl> and pressing F at the same time :)

How does it work? It is a series of scripts that have to be called in the right order. It's a bit messy, because it was very much an exploratory process.

The first pass is a script that just keeps walking down the chain of snippet pages until it can't find any more links. When it is run the second time around, it walks down the pages until it encounters a link that it has already seen. It sleeps 15 seconds between fetching each page (Perl Monk HTML hackers would do well to take notice of that last point). Similarly, the process is cronned at 4:15 UTC as I figure that's a pretty quiet time for yoda (the machine is running on).

A second script then kicks in which cleans up some yucky inconsistencies, like reaped nodes with no titles, and reformatting the data to make it easy to process afterwards. This was a bugger to get right. I first tried to do it all in the first script, but it turned out to be simpler to let the fetching script do as little as possible, just fetch and dump, and let another script do the cleaning. It's awkward to carry state around between HTML::Parser callbacks.

A third script then takes the cleansed file and loads it into different hashes, to print them out in various sorted orders all different, for the various HTML views. For instance, it's at this stage where I calculate the number of nodes a person has written, how many nodes written by monks whose nicks start with 't', which makes it easy to emit the correct rowspan attributes to get everything to line up.

A fourth script then walks through all the files generated by the third pass and encodes them as HTML. No, I didn't use any HTML-generating modules. Naughty me, I did it all by hand, serves me right for not paying attention to what modules jcwren has installed on the server. This script creates the files in a directory named /pmsinew under my document root, and then when it has finished, it names /pmsi to /pmsiold and /pmsinew to /pmsi, and then proceeds to unlink the /pmsiold directory. Which means it should be pretty hard to come across a half-constructed index.

It would be a euphemism to say that the code lacks elegance in certain places. I was more interested in hacking up something quickly than showing The Right Way to do things. I do promise to rewrite the scripts and maybe even drop in a comment here and there.

When I do, I'll post the link from the /pmsi/ homepage.

g r i n d e r

In reply to Re:x2 PMSI - Perl Monks Snippets Index by grinder
in thread PMSI - Perl Monks Snippets Index by grinder

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2023-12-04 10:00 GMT
Find Nodes?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?

    Results (25 votes). Check out past polls.