Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Bulk Node Title Editor

by jcwren (Prior)
on Dec 01, 2001 at 04:59 UTC ( #128806=sourcecode: print w/ replies, xml ) Need Help??

Category: Perlmonks.org Related Scripts
Author/Contact Info Chris "jcwren" Wren
jcwren@jcwren.com
Description: Allows bulk editing of user's node titles. Useful when going back and 'kudratizing' titles (see Suggestion for post-naming convention).
#!/usr/local/bin/perl -w

#
#              Bulk Node Title Editor
#
#  This utility was borne out of hearing several people in the chatter
+box
#  mentioning they were going through all their nodes and 'kudratizing
+'
#  them (see [id://21814]).  After remembering how painful it was to t
+ouch
#  up all my titles when I did that, I figured "heck, why not write a 
#  utility that can whore some more XP... I mean, help people with a d
+ull
#  repetitive task?".  So here ya go...
#
#  Now, that being said, this utility is incredibly dangerous.  You ca
+n do
#  something really stupid, like go through and replace all your node 
+titles
#  with... NOTHING!  Yes, that's right, the power of the regexp could 
+let you
#  completely destroy your home node titles.  To fix that dangerous se
+lf-
#  loading, self-cocking, and damn near self-firing gun, a backup file
+ is
#  generated from any runs.  A backup file can't be overwritten by a s
+ub-
#  sequent run.  And there is a magical restore function that allows y
+ou
#  to restore your hard-typed titles after they've been... inexplicabl
+y
#  altered.  Yea, that's the ticket!
#
#  Some serious POD reading, consideration, and understanding of a bas
+ic
#  regexp is in order before you use this utility.  The default mode i
+s
#  to perform every step except actually updating the node unless the 
+-Z
#  switch is specified, so you can play around and watch an inexpertly
#  written regexp wreck a title before you actually commit it back to 
+PM.
#
#  !!! SEE THE SECURITY SECTION OF THE POD BEFORE RUNNING THIS PROGRAM
+ !!!
#

#
#  $Id: bnteditor.pl,v 1.0.0.1 2001/11/30 23:46:30 jcw Exp $
#  $Revision: 1.0.0.1 $
#  $Author: jcw $
#  $Date: 2001/11/30 23:46:30 $
#  $Log: bnteditor.pl,v $
#  Revision 1.0.0.1  2001/11/30 23:46:30  jcw
#  initial import into CVS
#
#  

use strict;
use LWP::UserAgent;
use URI::Escape;
use HTTP::Cookies;
use HTML::Form;
use Getopt::Std;
use XML::Twig;
use HTML::Entities;

#
#  Change this to suit your taste. 
#
my %config = (username        => undef,      # 'myusername',
              password        => undef,      # 'mypassword',
              quiet           => 0,
              reallydoit      => 0,
              restore         => 0,
              regex           => undef,
              filename        => 'savemybutt.txt',
              newest          => 0,
              baseurl         => 'http://www.perlmonks.org',
              pmsite          => 'http://www.perlmonks.org/index.pl?',
              allnodes        => 'node=user%20nodes%20info%20xml%20gen
+erator',
              shortnode       => 'XP%20xml%20ticker',
             );

sub UNSAFE_CHARS {"^A-Za-z0-9\-_.!~*'()"}

#
#  Ye olde main
#
{
   my %args = ();

   getopts ('u:p:r:f:qZRPnh?', \%args);

   if ($args {'?'} || $args {h})
   {
      usage ();
      exit;
   }

   if ($args {P})
   {
      local $| = 1;
      print "Password: ";
      $args {p} = <STDIN>;
      chomp ($args {p});
   }

   $config {username}      = $args {u} || $config {username}      || d
+ie "No username.  Program terminated.\n";
   $config {password}      = $args {p} || $config {password}      || d
+ie "No password.  Program terminated.\n";
   $config {regex}         = $args {r} || $config {regex}         || d
+ie "No regex.  Program terminated.\n" if (!exists $args {R});
   $config {filename}      = $args {f} || $config {filename}      || d
+ie "No filename.  Program terminated.\n";
   $config {restore}       = $args {R} || $config {restore};
   $config {reallydoit}    = $args {Z} || $config {reallydoit};
   $config {quiet}         = $args {q} || $config {quiet};
   $config {newest}        = $args {n} || $config {newest};
   $config {baseurl}                                              || d
+ie "\$config {baseurl} not defined.  Program terminated.\n";
   $config {pmsite}                                               || d
+ie "\$config {pmsite} not defined.  Program terminated.\n";
   $config {allnodes}                                             || d
+ie "\$config {allnodes} not defined.  Program terminated.\n";

   die "-r and -R mutually exclusive\n" if ($config {regex} && $config
+ {restore});

   if (!$config {restore})
   {
      die "regex needs to be formed as 's/match/replace/{options}'" if
+ ($config {regex} !~ m|^s/|);
      eval '$a = ""; $a =~ ' . $config {regex};
      die "badly formed regex: $@" if $@;

      die "Backup file \"", $config {filename}, "\" exists, pick anoth
+er name\n" if (-e $config {filename});
   }
   else
   {
      die "Backup file \"", $config {filename}, "\" does not exist: $!
+\n" if (! -e $config {filename});
   }

   doUpdates ();
}

#
#
#
sub doUpdates
{
   my $useragent = new LWP::UserAgent;

   print scalar localtime, ": logging in\n" unless ($config {quiet});

   $useragent->agent ("Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4
+.0)");
   $useragent->cookie_jar (HTTP::Cookies->new (ignore_discard => 1));

   doPerlmonksLogin ($useragent, $config {username}, $config {password
+});

   print scalar localtime, ": login completed\n" unless ($config {quie
+t});

   if ($config {restore}) {
      doRestoreTitles ($useragent);
   } else {
      doUpdateTitles ($useragent, getArticleList ($useragent));
   }
}

#
#
#
sub doRestoreTitles
{
   my ($useragent) = @_;

   open BACKUPFILE, $config {filename} or die "Can't open backup file 
+", $config {filename}, ": $!\n";
   chomp (my @nodelist = <BACKUPFILE>);
   close BACKUPFILE;

   die "No nodes in backup list" if (!scalar @nodelist);

   print scalar localtime, ": ", scalar @nodelist, " nodes to update\n
+" unless ($config {quiet});

   my %nodehash = map { split /,/, $_, 2 } @nodelist;
   
   foreach my $nodeid (sort {$config {newest} ? $b <=> $a : $a <=> $b}
+ keys %nodehash)
   {
      print scalar localtime, ": getting node $nodeid\n" unless ($conf
+ig {quiet});
      my $url = $config {pmsite} . "node_id=$nodeid";
      my $page = getProtectedPage ($useragent, $url) or die "Get on $u
+rl failed.";

      my @forms = HTML::Form->parse ($page, $config {baseurl});
      my $found = 0;

      foreach my $form (@forms)
      {
         if ($form->find_input ("note_title"))
         {
            $found = 1;
            print scalar localtime, ": $nodeid=", $form->value ("note_
+title"), "\n" unless ($config {quiet});
            print scalar localtime, ": new title: ", $nodehash {$nodei
+d}, "\n" unless ($config {quiet});
            $form->value ("note_title", $nodehash {$nodeid});

            if ($config {reallydoit})
            {
               my $res = $useragent->request ($form->click ('sexisgood
+'));
               $res->is_error && die "Can't post changes";
               print scalar localtime, ": updated $nodeid\n" unless ($
+config {quiet});
            }
            else
            {
               print scalar localtime, ": updated $nodeid (faked)\n" u
+nless ($config {quiet});
            }

            last;
         }
      }

      print scalar localtime, ": node $nodeid is not editable (root no
+de)\n" if (!$found && !$config {quiet});
   }
}

#
#
#
sub doUpdateTitles
{
   my ($useragent, $nodehash) = @_;

   foreach my $nodeid (sort {$config {newest} ? $b <=> $a : $a <=> $b}
+ keys %$nodehash)
   {
      print scalar localtime, ": getting node $nodeid\n" unless ($conf
+ig {quiet});
      my $url = $config {pmsite} . "node_id=$nodeid";
      my $page = getProtectedPage ($useragent, $url) or die "Get on $u
+rl failed.";

      my @forms = HTML::Form->parse ($page, $config {baseurl});
      my $found = 0;

      foreach my $form (@forms)
      {
         if ($form->find_input ("note_title"))
         {
            $found = 1;
            my $title = $form->value ("note_title");
            print scalar localtime, ": $nodeid=$title\n" unless ($conf
+ig {quiet});

            open BACKUPFILE, ">>" . $config {filename} or die "Can't w
+rite backup file: $!\n";
            print BACKUPFILE sprintf ("%d,%s\n", $nodeid, $title);
            close BACKUPFILE;

            eval '$title =~ ' . $config {regex};
            die "regexp error: $@" if $@;

            print scalar localtime, ": new title: $title\n" unless ($c
+onfig {quiet});
            $form->value ("note_title", $title);

            if ($config {reallydoit})
            {
               my $res = $useragent->request ($form->click ('sexisgood
+'));
               $res->is_error && die "Can't post changes";
               print scalar localtime, ": updated $nodeid\n" unless ($
+config {quiet});
            }
            else
            {
               print scalar localtime, ": updated $nodeid (faked)\n" u
+nless ($config {quiet});
            }

            last;
         }
      }

      print scalar localtime, ": node $nodeid is not editable (root no
+de)\n" if (!$found && !$config {quiet});
   }
}

#
#
#
sub getArticleList
{
   @_ > 0 or die "Incorrect number of parameters";

   my ($useragent) = @_;
   my %nodehash = ();
   my $url = $config {pmsite} . $config {allnodes};

   my $page = getProtectedPage ($useragent, $url) or die "Get on $url 
+failed.";

   my $twig= new XML::Twig (TwigRoots =>
            { NODE => sub { my ($t, $node) = @_;
                            my $nodeid = $node->att ('id');
                            !exists ($nodehash {$nodeid}) or die "Node
+ $nodeid is duplicated!";
                            my $title = decode_entities ($node->text (
+));
                            $title =~ s/&apos;/'/g; # Why is this miss
+ed?
                            $nodehash {$nodeid} = {'nodeid' => $nodeid
+,
                                                   'title'  => $title,
                                                   'rep'    => $node->
+att ('reputation'),
                                                   'last'   => $node->
+att ('reputation'),
                                                   'date'   => $node->
+att ('createtime')
                                                  };
                            $t->purge;
                          }
            });

   $twig->parse ($page);

   return (\%nodehash);
}

#
#
#
sub getProtectedPage
{
   @_ > 1 or die "Incorrect number of parameters";

   my ($useragent, $url) = @_;

   my $req = new HTTP::Request ('GET' => $url, HTTP::Headers->new ('Co
+ntent-Type' => 'application/x-www-form-urlencoded'));

   $useragent->cookie_jar->add_cookie_header ($req);

   my $res = $useragent->request ($req);

   return undef if ($res->is_error);

   return $res->content ();
}

#
#  We use 'XP%20xml%20ticker' because we know that will be a really sh
+ort page returned.
#  We don't care about the contents, but no point in loading the entir
+e text of the
#  front page.
#
sub doPerlmonksLogin 
{
   @_ >= 3 || die "At least 4 arguments required";

   my ($useragent, $username, $password) = @_;
   my %pairs = ();

   my $req = new HTTP::Request ('POST' => $config {pmsite}, HTTP::Head
+ers->new ('Content-Type' => 'application/x-www-form-urlencoded'));

   @pairs {qw(user passwd op node)} = ($username, $password, 'login', 
+$config {shortnode});

   $req->content (join '&', map {$_ . "=" . uri_escape ($pairs {$_}, U
+NSAFE_CHARS)} keys %pairs);

   my $res = $useragent->request ($req);

   if ($res->is_success)
   {
      $useragent->cookie_jar->extract_cookies ($res);

      return $res if ($useragent->cookie_jar->as_string () =~ m/userpa
+ss/i);

      die "Eeek!  Log in failed.  Bad username or password?\n";
   }

   die "Eeek! Request to ", $config {pmsite}, " failed!\n";
}

#
#
#
sub usage
{
   print <<ENDOFHELP;

usage: bnteditor.pl [-h | -?] [-u username] [-p password] [-P] [-r reg
+exp] 
                    [-f filename] [-R] [-q] [-Z] [-n]

Bulk Node Title Editor

   -h             this help list
   -?             this help list
   -u username    user name on Perlmonks.org
   -p password    password for username
   -P             prompt for password interactively
   -r regexp      regexp to apply to each node title
   -f filename    name of backup file for backup/restore
   -R             restore damaged node titles from backup file
   -q             run quietly
   -Z             really do it, instead of 'safe mode'
   -n             operate on newer nodes first

   -r and -R are mutually exclusive.

   To be safe, regexp should be quoted in the shell to prevent the she
+ll
   from interpolating any recognized characters.

ENDOFHELP
}

__END__

=head1 NAME

bnteditor - Bulk Node Title Editor

=head1 SYNOPSIS

usage: bnteditor.pl [B<-h> | B<-?>] [B<-u> username] [B<-p> password] 
+[B<-P>]
[B<-r> regexp] [B<-f> filename] [B<-R>] [B<-q>] [B<-Z>]

Bulk Node Title Editor

   -h             this help list
   -?             this help list
   -u username    user name on Perlmonks.org
   -p password    password for username
   -P             prompt for password interactively
   -r regexp      regexp to apply to each node title
   -f filename    name of backup file for backup/restore
   -R             restore damaged node titles from backup file
   -q             run quietly
   -Z             really do it, instead of 'safe mode'
   -n             operate on newer nodes first

   -r and -R are mutually exclusive.

   To be safe, regexp should be quoted in the shell to prevent the she
+ll
   from interpolating any recognized characters.

=head1 DESCRIPTION

bnteditor is a utility for applying a regular expression to each and
every non-root home node title.  Many people have started 'kudratizing
+'
thier node titles (see [id://21814]).  This is a slow and laborious
process of going to each node, changing it, submitting it, then going
back to the users list of nodes to see the next to edit.  When a few
dozen nodes, this can easily waste over an hour.

This utility automates that process.  The users list of nodes is fetch
+ed,
each node is stepped through, and assuming it is not a root node (root
nodes are not editable, except under special circumstances), applies t
+he
user supplied regular expression to the title, then submits the node b
+ack.

Obviously, there is quite a potential for damage here.  To help reduce
+ the
possibility of getting ones nodes into totally befunged state, as each
+ node
is edited, the original title is stored into a backup file.  Only node
+s
that are changed are stored in the file, so if you have 2100 nodes, an
+d
100 are edited successfully, but the 101st reveals a problem with the 
regular expression, only the first 101 will be restored.

In addition, rather than defauting to behavior that automatically upda
+tes
the nodes, the B<-Z> switch must be passed.  Otherwise, all actions ar
+e
taken except for actually posting the node back to perlmonks.org.  Sin
+ce
the default is for B<-q> is off, a complete record of actions will be 
displayed.  

=head1 EXAMPLES

The following example prepends the string '(myusername) ' to the front
+ of
every node title.  B<-Z> indicates it should really be done, instead o
+f a
trial run.

    bnteditor -u myusername -p password -r 's/^/(myusername) /' -Z

This example might be if you (somehow) changed your username from 'iam
+fred'
to 'iamdaphne', regardless of the case of 'iamfred':

    bnteditor -u iamdaphne -p sc00by -r 's/iamfred/iamdaphne/i' -Z

This example would simply generate a backup file for all current node 
+titles,
and save them to the file 'nodetitles':

    bnteditor -u iamdaphne -p sc00by -r 's///' -f nodetitles

=head1 OPTIONS

=over 4

=item B<-h> or B<-?> help

Prints out a brief help message.

=item B<-u> username

The username of the perlmonks.org account that the image is to be uplo
+aded
to.

=item B<-p> password

The password for the username.  Be sure to READ THE SECURITY SECTION, 
+below.
There is important information about keeping your password private.

=item B<-P>

Enter password interactively.  The password is prompted for as soon as
+ the
script starts.  While impractical if the script is started from a cron
+job,
it is the most secure method.  See B<SECURITY>.

=item B<-r> regexp

This is a normal Perl regular expression.  Most likely you'll want to 
+quote
this in the shell so that parenthesis, dollar signs, and other charact
+ers are
not attempted to be interpolated by the shell.  The form is 
's/match/replace/options'.  The expression is applied to the node titl
+e via
an eval().  Note that a basic check of expression validity is checked 
+for 
before starting to process nodes.  Basically, the check is

    eval '$a = ""; $a =~ ' . $regex;

followed by a check of B<$@>.  This option is mutually exclusive with 
+the
the B<-r> option, and specifying both terminates the script with an er
+ror.

=item B<-f> filename 

This option specifies the file name that should be used as either the 
+backup
or restore file, depending on whether nodes are being updated or resto
+red. 
When used as a backup file, the script will terminate if the file alre
+ady
exists.  There is no way to override this, and it's for your own prote
+ction.
When restoring, the presence of the file is checked for, and must cont
+ain
at least one line before the restore operation is commenced.  The defa
+ult
for this is ''savemybutt.txt''

=item B<-R>

Replace backed up node titles.  Rather than apply a regular expression
+ to 
each node title, use the node titles saved in the backup file.  Unless
+ a
B<-f> option is specified, the default ''savemybutt.txt'' file will be
used as the source for the backup.  This option is mutually exclusive 
+with 
the B<-r> option, and specifying both terminates the script with an er
+ror.

=item B<-q>

Run script quietly.  Normally, the script is rather verbose about what
+ it's
doing.  Since the danger of wreaking havoc on node titles is pretty hi
+gh,
this is a desirable default.  However, if for some bizarre reason one 
+feels
compelled to have the script quietly do it's thing, this option causes
+ all
progress messages to be suppressed.

=item B<-Z>

Actually make changes.  Without this option, the script will perform a
+ll
steps in the update or restore, with the exception of actually submitt
+ing
the changes back to perlmonks.org.  This option is required to actuall
+y
make things happen.  It's recommended that this option not be specifie
+d
until you are completely sure that the regular expression is doing wha
+t is
intended, and not replacing all node titles with 'Petrified Natalie Po
+rtman
and hot grits!'.  Use with care.  Lots of care.  Extreme care.  So muc
+h
care that you wonder if you shouldn't have done all of this by hand an
+yway,
what with all these warnings and all.

=item B<-n>

This option causes newer nodes to be processed first.  Normally, nodes
are processed in older to newer older, as defined by the node ID.  Spe
+cifying
B<-n> causes the processing order to be newer to older order.

=head1 SECURITY

Remember that if you provide usernames and passwords on the command li
+ne, 
someone else can see those with ''ps'' on most systems.  Since your 
perlmonks.org account should be more sacred than your winning lottery
numbers, that rare signed Larry Bird baseball card, or the URL for tha
+t
free Angelina Jolie pr0n site.  There are two ways you can protect you
+r 
password.  You can either edit the script and set it as a default, the
+n 
make sure you make the script owner readable only (chmod 700), or you 
+can 
use the B<-P> option, and require the script to interactively prompt y
+ou 
for your password each time it's run.  

Obviously, on a single user system, this isn't much concern.  However,
+ if
you're running from an account on perlmonk.org, where there are over 1
+00 
users at last count, your want to be careful about compromising inform
+ation. 

=head1 IMPROVEMENTS

A worthwhile improvment might be being able to specify a numeric range
+ or
list of nodes to change, rather than the whole list.  Producing a log 
+file
might be useful, as would being able to generate a quicker backup file
+ 
from the loaded XML, rather than stepping through each node.

=head1 WARRANTY

You B<must> be high.

=head1 AUTHOR

J. C. Wren E<lt>jcwren@jcwren.comE<gt>

=head1 COPYRIGHT

Copyright (c) 2001 J.C.Wren. All rights reserved.

This is free software.  You can redistribute it and/or modify it 
under the same terms as Perl itself.

=head1 HISTORY

    $Log: bnteditor.pl,v $
    Revision 1.0.0.1  2001/11/30 23:46:30  jcw
    initial import into CVS


=cut

Comment on Bulk Node Title Editor
Download Code

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://128806]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (11)
As of 2014-11-26 01:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (160 votes), past polls