Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Personal Nodelet Extractor

by jeffa (Chancellor)
on Jun 05, 2001 at 23:44 UTC ( #85973=sourcecode: print w/ replies, xml ) Need Help??

Category: PerlMonks.org Related Scripts
Author/Contact Info /msg jeffa
Description: Is your Personal Nodelet getting too big? Mine sure is was. So, I wrote this little bot to parse my Personal Nodelet and output the nodes in the form of: id [delimiter] name. Once the data is in an easy (and fast) form to parse, you can use another script to edit it, munge it some more - or you can modify this script to reflect your needs.

Also, check out Personal Nodelet Restorer.

Comments and improvements always welcome. :)

Fixes:
thanks to crazyinsomniac PodMaster for pointing out my lack of Win32 support in the synopsis

UPDATE: - after gleaming good security info from jcwren, i changed this script to accept an interactive password similiar to how mysql asks for passwords: you can specify the password on the command line with the -p option (dangerous for mulit-user boxes), or you can just specify -p by itself and be prompted for the password. I use Term::ReadKey to keep the password 'invisible'. Because of the new nature, be sure and specify -p last, after -u and -s if used.

UPDATE (Mar 24,2002): - somebody moved 'lastnode' before 'node' so regex has been updated

#!/usr/bin/perl -w
use strict;

use LWP;
use Term::ReadKey;
use Getopt::Std;
use HTML::TokeParser;
use HTTP::Request::Common;

$| = 1;

use vars qw(%opts);
getopts('p:u:s:h',\%opts);
my ($user,$passwd,$help) = parse_args(\%opts);
USAGE() and exit unless $user and $passwd and not $help;

use constant URL => 'http://www.perlmonks.org/';
my $SEP = $opts{s} || "\t";

my $ua = LWP::UserAgent->new;
$ua->agent('personal_nodlet_extractor/1.0 (' . $ua->agent .')');

# log in and access your User Setting page in raw format
my $request = POST(URL,
   Content => [
      op          => 'login',
      user        => $user,
      passwd      => $passwd,
      node_id     => 1072,
      displaytype => 'raw',
   ] 
);

my $response = $ua->request($request);
die $response->message unless $response->is_success;

# pass the HTML content to TokeParser
my $content = $response->content;
my $parser  = HTML::TokeParser->new(\$content);

# 'fast forward' until we find "<b>Personal Nodelet</b>"
while ($parser->get_tag('b')) {
   last if 'Personal Nodelet' eq $parser->get_text;
}

# these are the [links|droids] we are looking for
while (my $tag = $parser->get_tag('a')) {
   if ($tag->[1]->{href} =~ /[^t]node_id=(\d+)/) {
      print $1 . $SEP . $parser->get_text . "\n";
   }
}

sub parse_args {
   my %opt = %{+shift};
   if (exists $opt{'p'} and not defined $opt{'p'} and defined $opt{'u'
+}) {
      print "Enter password: ";
      ReadMode 'noecho';
      chomp($opt{'p'} = ReadLine 0);
      ReadMode 'normal';
   }  
   return @opt{qw(u p h)};
}

sub USAGE { print "USAGE: $0 -u user [-s separator] -p password\n" }

=pod

=head1 NAME

extract_personal_nodelet.pl - LWP script

=head1 DESCRIPTION

This is a simple script that uses LWP and HTML::TokeParser to
access your User Settings page and extracts your Personal
Nodelet links.

=head1 SYNOPSIS

  for *nix:
    ./extract_personal_nodelet.pl -u uname -s : -p 
  for win32:
    perl extract_personal_nodelet.pl -u uname -s : -p

Invokes the script for the specified username and use a 
colon as the record separator. The script will prompt
you for your password if you do not specify it. The
contents would look something like this:

 24270:Permutations and combinations
 25730:Life in the land of OOP, and I'm confused.
 17890:Shift, Pop, Unshift and Push with Impunity!
 32005:Apache::MP3
 34786:Why I like functional programming

The default record seperator is a tab. I recommend you use it
since just about any character is fair game for a node title.

=cut

Comment on Personal Nodelet Extractor
Download Code
Re: Personal Nodelet Extractor
by mirod (Canon) on Aug 03, 2001 at 18:58 UTC

    The default record seperator is a tab. I recommend you use it since just about any character is fair game for a node title.

    Maybe you could add an other option, -r for example that would replace $SEP:

    # option definition becomes(sorry to break the push ;--( getopts('p:u:s:r:h',\%opts); # replacement string initialization my $REP= $opts{r} || " "; # loop that processes the links while (my $tag = $parser->get_tag('a')) { if ($tag->[1]->{href} =~ /node_id=(\d+)/) { my $text= $parser->get_text; $text=~ {$SEP}{$REP}go; print $1 . $SEP . $text . "\n"; } }

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://85973]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2014-07-13 17:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (250 votes), past polls