Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Converting FLOSS manuals to Plucker format

by hacker (Priest)
on Feb 10, 2009 at 22:04 UTC ( #742905=perlquestion: print w/ replies, xml ) Need Help??
hacker has asked for the wisdom of the Perl Monks concerning the following question:

I stumbled across a site called “FLOSS Manuals” recently, and thought that it would be a great place to create some new Plucker documents for our users, and distribute them. I’ve create hundreds of other Plucker documents for users in years past, so this was a natural progression of that.

When I quickly Googled around, I found someone was already doing exactly that, albeit in a one-off shell script.

I decided to take his work and build upon it, making it self-healing, and created what I call the “Plucker FLOSS Manuals Autobuilder v1.0″ :)

Optimizations, comments and suggestions welcome and appreciated...

#!/usr/bin/perl -w use strict; use warnings; use diagnostics; use LWP::UserAgent; use HTML::SimpleLinkExtor; my $flossurl = 'http://en.flossmanuals.net'; my $ua = 'Plucker FLOSS Manuals Autobuilder v1.0 [desrod@gnu- +designs.com]'; my $top_extor = HTML::SimpleLinkExtor->new(); # fetch the top-level page and extract the child pages $top_extor->parse_url($flossurl, $ua); my @links = grep(m:^/:, $top_extor->a); pop @links; # get rid of '/news' item from @links; fragile # Get the print-only page of each child page get_printpages($flossurl . $_) for @links; ###################################################################### +####### # # Get the pages themselves, and return their content to the caller # ###################################################################### +####### sub get_content { my $url = shift; my $ua = 'Mozilla/5.0 (en-US; rv:1.4b) Gecko/20030514 +'; my $browser = LWP::UserAgent->new(); $browser->agent($ua); my $response = $browser->get($url); my $decoded = $response->decoded_content; # This was necessary, because of a bug in ::SimpleLinkExtor, # otherwise this code would be 10 lines shorter. Sigh. if ($response->is_success) { return $decoded; } } ###################################################################### +####### # # Fetch the print links from the child pages snarfed from the top-leve +l page # ###################################################################### +####### sub get_printpages { my $page = shift; my $sub_extor = HTML::SimpleLinkExtor->new(); $sub_extor->parse(get_content($page)); # Single out only the /print links on each sub-page my @printlinks = grep(m:^/.*/print$:, $sub_extor->a); my $url = $flossurl . $printlinks[0]; (my $title = $printlinks[0]) =~ s,\/(\w+)\/print,$1,; # Build it with Plucker print "Building $title from $url\n"; plucker_build($url, $title); } ###################################################################### +####### # # Build the content with Plucker, using a "safe" system() call in list +-mode # ###################################################################### +####### sub plucker_build { my ($url, $title) = @_; my $workpath = "/tmp/"; my $pl_url = $url; my $pl_bpp = "8"; my $pl_compression = "zlib"; my $pl_title = $title; my $pl_copyprevention = "0"; my $pl_no_url_info = "0"; my $pdb = $title; my $systemcmd = "/usr/bin/plucker-build"; my @systemargs = ( '-p', $workpath, '-P', $workpath, '-H', $pl_url, $pl_bpp ? "--bpp=$pl_bpp" : (), ($pl_compression ? "--${pl_compression}-compre +ssion" : ''), '-N', $pl_title, $pl_copyprevention ? $pl_copyprevention : (), $pl_no_url_info ? $pl_no_url_info : (), '-V1', "--staybelow=$flossurl/floss/pub/$title/", '--stayonhost', '-f', "$pdb"); system($systemcmd, @systemargs); }

Comment on Converting FLOSS manuals to Plucker format
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://742905]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (10)
As of 2015-06-30 06:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What kind of chocolate gives you the most pleasure?















    Results (777 votes), past polls