Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Personal Nodelet Extractor

by jeffa (Chancellor)
on Jun 05, 2001 at 23:44 UTC ( #85973=sourcecode: print w/ replies, xml ) Need Help??

Category: PerlMonks.org Related Scripts
Author/Contact Info /msg jeffa
Description: Is your Personal Nodelet getting too big? Mine sure is was. So, I wrote this little bot to parse my Personal Nodelet and output the nodes in the form of: id [delimiter] name. Once the data is in an easy (and fast) form to parse, you can use another script to edit it, munge it some more - or you can modify this script to reflect your needs.

Also, check out Personal Nodelet Restorer.

Comments and improvements always welcome. :)

Fixes:
thanks to crazyinsomniac PodMaster for pointing out my lack of Win32 support in the synopsis

UPDATE: - after gleaming good security info from jcwren, i changed this script to accept an interactive password similiar to how mysql asks for passwords: you can specify the password on the command line with the -p option (dangerous for mulit-user boxes), or you can just specify -p by itself and be prompted for the password. I use Term::ReadKey to keep the password 'invisible'. Because of the new nature, be sure and specify -p last, after -u and -s if used.

UPDATE (Mar 24,2002): - somebody moved 'lastnode' before 'node' so regex has been updated

#!/usr/bin/perl -w
use strict;

use LWP;
use Term::ReadKey;
use Getopt::Std;
use HTML::TokeParser;
use HTTP::Request::Common;

$| = 1;

use vars qw(%opts);
getopts('p:u:s:h',\%opts);
my ($user,$passwd,$help) = parse_args(\%opts);
USAGE() and exit unless $user and $passwd and not $help;

use constant URL => 'http://www.perlmonks.org/';
my $SEP = $opts{s} || "\t";

my $ua = LWP::UserAgent->new;
$ua->agent('personal_nodlet_extractor/1.0 (' . $ua->agent .')');

# log in and access your User Setting page in raw format
my $request = POST(URL,
   Content => [
      op          => 'login',
      user        => $user,
      passwd      => $passwd,
      node_id     => 1072,
      displaytype => 'raw',
   ] 
);

my $response = $ua->request($request);
die $response->message unless $response->is_success;

# pass the HTML content to TokeParser
my $content = $response->content;
my $parser  = HTML::TokeParser->new(\$content);

# 'fast forward' until we find "<b>Personal Nodelet</b>"
while ($parser->get_tag('b')) {
   last if 'Personal Nodelet' eq $parser->get_text;
}

# these are the [links|droids] we are looking for
while (my $tag = $parser->get_tag('a')) {
   if ($tag->[1]->{href} =~ /[^t]node_id=(\d+)/) {
      print $1 . $SEP . $parser->get_text . "\n";
   }
}

sub parse_args {
   my %opt = %{+shift};
   if (exists $opt{'p'} and not defined $opt{'p'} and defined $opt{'u'
+}) {
      print "Enter password: ";
      ReadMode 'noecho';
      chomp($opt{'p'} = ReadLine 0);
      ReadMode 'normal';
   }  
   return @opt{qw(u p h)};
}

sub USAGE { print "USAGE: $0 -u user [-s separator] -p password\n" }

=pod

=head1 NAME

extract_personal_nodelet.pl - LWP script

=head1 DESCRIPTION

This is a simple script that uses LWP and HTML::TokeParser to
access your User Settings page and extracts your Personal
Nodelet links.

=head1 SYNOPSIS

  for *nix:
    ./extract_personal_nodelet.pl -u uname -s : -p 
  for win32:
    perl extract_personal_nodelet.pl -u uname -s : -p

Invokes the script for the specified username and use a 
colon as the record separator. The script will prompt
you for your password if you do not specify it. The
contents would look something like this:

 24270:Permutations and combinations
 25730:Life in the land of OOP, and I'm confused.
 17890:Shift, Pop, Unshift and Push with Impunity!
 32005:Apache::MP3
 34786:Why I like functional programming

The default record seperator is a tab. I recommend you use it
since just about any character is fair game for a node title.

=cut

Comment on Personal Nodelet Extractor
Download Code
Re: Personal Nodelet Extractor
by mirod (Canon) on Aug 03, 2001 at 18:58 UTC

    The default record seperator is a tab. I recommend you use it since just about any character is fair game for a node title.

    Maybe you could add an other option, -r for example that would replace $SEP:

    # option definition becomes(sorry to break the push ;--( getopts('p:u:s:r:h',\%opts); # replacement string initialization my $REP= $opts{r} || " "; # loop that processes the links while (my $tag = $parser->get_tag('a')) { if ($tag->[1]->{href} =~ /node_id=(\d+)/) { my $text= $parser->get_text; $text=~ {$SEP}{$REP}go; print $1 . $SEP . $text . "\n"; } }

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://85973]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (14)
As of 2014-07-31 18:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (251 votes), past polls