I stumbled across a site called “
FLOSS Manuals” recently, and thought that it would be a great place to create some new
Plucker documents for our users, and distribute them. I’ve create
hundreds of other Plucker documents for users in years past, so this was a natural progression of that.
When I quickly Googled around, I found someone was already doing exactly that, albeit in a one-off shell script.
I decided to take his work and build upon it, making it self-healing, and created what I call the “Plucker FLOSS Manuals Autobuilder v1.0″ :)
Optimizations, comments and suggestions welcome and appreciated...
#!/usr/bin/perl -w
use strict;
use warnings;
use diagnostics;
use LWP::UserAgent;
use HTML::SimpleLinkExtor;
my $flossurl = 'http://en.flossmanuals.net';
my $ua = 'Plucker FLOSS Manuals Autobuilder v1.0 [desrod@gnu-
+designs.com]';
my $top_extor = HTML::SimpleLinkExtor->new();
# fetch the top-level page and extract the child pages
$top_extor->parse_url($flossurl, $ua);
my @links = grep(m:^/:, $top_extor->a);
pop @links; # get rid of '/news' item from @links; fragile
# Get the print-only page of each child page
get_printpages($flossurl . $_) for @links;
######################################################################
+#######
#
# Get the pages themselves, and return their content to the caller
#
######################################################################
+#######
sub get_content {
my $url = shift;
my $ua = 'Mozilla/5.0 (en-US; rv:1.4b) Gecko/20030514
+';
my $browser = LWP::UserAgent->new();
$browser->agent($ua);
my $response = $browser->get($url);
my $decoded = $response->decoded_content;
# This was necessary, because of a bug in ::SimpleLinkExtor,
# otherwise this code would be 10 lines shorter. Sigh.
if ($response->is_success) {
return $decoded;
}
}
######################################################################
+#######
#
# Fetch the print links from the child pages snarfed from the top-leve
+l page
#
######################################################################
+#######
sub get_printpages {
my $page = shift;
my $sub_extor = HTML::SimpleLinkExtor->new();
$sub_extor->parse(get_content($page));
# Single out only the /print links on each sub-page
my @printlinks = grep(m:^/.*/print$:, $sub_extor->a);
my $url = $flossurl . $printlinks[0];
(my $title = $printlinks[0]) =~ s,\/(\w+)\/print,$1,;
# Build it with Plucker
print "Building $title from $url\n";
plucker_build($url, $title);
}
######################################################################
+#######
#
# Build the content with Plucker, using a "safe" system() call in list
+-mode
#
######################################################################
+#######
sub plucker_build {
my ($url, $title) = @_;
my $workpath = "/tmp/";
my $pl_url = $url;
my $pl_bpp = "8";
my $pl_compression = "zlib";
my $pl_title = $title;
my $pl_copyprevention = "0";
my $pl_no_url_info = "0";
my $pdb = $title;
my $systemcmd = "/usr/bin/plucker-build";
my @systemargs = (
'-p', $workpath,
'-P', $workpath,
'-H', $pl_url,
$pl_bpp ? "--bpp=$pl_bpp" : (),
($pl_compression ? "--${pl_compression}-compre
+ssion" : ''),
'-N', $pl_title,
$pl_copyprevention ? $pl_copyprevention : (),
$pl_no_url_info ? $pl_no_url_info : (),
'-V1',
"--staybelow=$flossurl/floss/pub/$title/",
'--stayonhost',
'-f', "$pdb");
system($systemcmd, @systemargs);
}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.