http://www.perlmonks.org?node_id=785731


in reply to Re: Status of Recent User Information Leak
in thread Status of Recent User Information Leak

If so, where? If not, why?

At present not even members of pmdev can get a download of the site and source code to install on their own machines to create a test environment.

It does indeed seem odd that a site that is part of the open-source community does not itself provide open access to its own site's source code. Much of the problem lies in the way the site is structured. The code used by the site is split between a database and source code files. Bundling up the source code files into a tarball probably wouldn't be so difficult. Extracting the relevant information from the database is another story.

The code stored in the database is spread across a plethora of database tables. We can't just dump the tables because some of these tables contain many different types of records: some store site content, some store code to layout and "skin" the site, and some contain integrity constraints and processing code for the various pages you see day to day at Perl Monks. For example, almost everything stored in the database has an entry in the "node" table.

To tease apart the material relating to site structure, process, and look and feel from the actual site content may require a certain amount of hand tagging. Finally a script would have to be written to gather together all the relevant records from each database table. Writing a script wouldn't be such a big deal if it weren't for the fact that the site documentation is spread out all over the place. Simply knowing what is used where is a challenge. It would be all to easy to leave a crucial bit out without good documentation. We are working on trying to improve the documentation situation, but it will take time.

Even after that script was written we would have another problem to deal with before we could publish the site's nodes: the database portion would (should?) be reviewed for any material that might be a security issue. These nodes would need to be refactored into separate components, one that could be published and one that should not be. There are somewhere between 600 and 1000 units to review.

I am not saying any of this as an excuse, merely as an explanation. I think the process of creating such an export would be very good practice for us. It would encourage good documentation and confirm the truth of what we had documented so far. It would give us a focused reason to keep publishable and site specific security code separate. That separation would make the site more secure. It would also make it easier to integrate new pmdevs. It would enable much faster implementation of bug fixes and site enhancement suggestions since PM devs could experiment and test their work more freely.

Best, beth