http://www.perlmonks.org?node_id=238029
Category: email programs
Author/Contact Info jaldhar
Description:

If you subscribe to several high-volume mailing lists like debian-user or kde-devel and fail to read them for a few days, the messages start piling up. Sometimes you can get back up to speed by spending some time diligently reading through the backlog (or at least holding down the 'D' key in pine ;-) but sometimes you just want to dump the old stuff and start over.

mailexpire is a simple script that will delete any message in a folder from more than a certain number of days ago. The default is 7 days but you can change it.

#!/usr/bin/perl
use HTTP::Date;
use Getopt::Long;
use Mail::Box::Manager;
use Pod::Usage;
use strict;
use warnings;

use constant ONE_DAY => 86400;
my $days = 7;
my $filename = '';
my $help = '';
my $verbose = 1;

Getopt::Long::Configure('gnu_getopt');
GetOptions (
  'days|d=i' => \$days,
  'folder|f=s' => \$filename,
  'help|h|?' => \$help,
  'verbose!' => \$verbose,
) or pod2usage(-exitstatus => 2, -verbose => 1);
pod2usage(-exitstatus => 0, -verbose => 2) if $help;
die "You must specify a folder\n" unless $filename;
my $expire = time - ($days * ONE_DAY);
my $mgr    = Mail::Box::Manager->new;
my $folder = $mgr->open
   ( $filename,
     access => 'rw',
     extract => 'LAZY', # To save memory & time, don't parse body unle
+ss needed.
     log => 'ERRORS',   # Don't log or trace warnings.
     trace => 'ERRORS',
   );
die "Cannot open $filename: $!\n" unless defined $folder;
my @messages = $folder->messages;
my $count = 0;
foreach my $message (@messages)
{
  ++$count;
  print 'Processing message ', $count, "\n"
    if $verbose && ($count % 100 == 0);
  my $date = str2time($message->date);
  $message->delete unless $date > $expire;
}
$folder->close (write => 'MODIFIED');

0;

__END__

=head1 NAME

mailexpire - Delete old mail from folders.

=head1 SYNOPSIS

mailexpire [options]

=head1 DESCRIPTION

B<mailexpire> deletes mail older than a certain number of days from a 
+folder.

=head1 OPTIONS

=over 8

=item B<--days|-d> I<number_of_days>

Any mail older than this number of days will be deleted from the folde
+r.  If
this option is not specified, the default number of days is 7.

=item B<--folder|-f> I</path/to/folder>

The location of the folder to operate on.  This option is mandatory.  
+The
folder can be of any type supported by L<Mail::Box>

=item B<--help|-h|-?>

Display the documentation.

=item B<--noverbose>

Surpresses the progress information.  This is useful if e.g. you are r
+unning
the script from a cron job.

=back

=head1 BUGS

The script is exceedingly slow on large folders which take up more
physical memory than you've got.  A folders' memory consumption seems 
+to
be about 5 times the size of the file on disk so it is easy to be forc
+ed into
swap and thrash if your folder is big.  The answer is probably not use
+ a pure
perl module like L<Mail::Box>.

=head1  SEE ALSO

L<Mail::Box-Overview>, L<perl>

=head1 AUTHOR

Jaldhar H. Vyas E<lt>jaldhar@braincells.comE<gt>

=head1 LICENSE

This code is free software under the Crowley Public License ("Do what
thou wilt shall be the whole of the license")

=head1 VERSION

1.1 -- Feb 24, 2003

=cut
Replies are listed 'Best First'.
Re: mailexpire
by Nomad (Pilgrim) on Feb 28, 2003 at 09:07 UTC

    This is very similar to a script that I use, except mine is supposed to be run by cron and therefore loops through several mailboxes. I've got some basic exception handling in it, because of this if one of the mailboxes is corrupted.

    The only comments I have is that you use the 'Date' header to expire. I decided not to do this and use the 'Date Received' header that exim writes to mails or the last Received header, just in case an email languishes somewhere on the net before being delivered. This happens rarely, but often enought to be an issue. The other comment is that you use HTTP::Date. This is OK as long as the dates come in that format. I looked around and see that MTAs write dates in all sorts of ways, hence I chose to use Date::Manip, because it understands most, if not all formats.