Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Trimming a mailbox

by oko1 (Deacon)
on Feb 13, 2012 at 05:04 UTC ( #953401=perlquestion: print w/replies, xml ) Need Help??
oko1 has asked for the wisdom of the Perl Monks concerning the following question:

I just wrote a script that does what I want - i.e., trimming a "catch-all" mailbox to just the last N days worth of emails. However, I'm a little uncomfortable with the basic design of the program - it just seems like there should be a smarter way to do this, somehow. Any suggestions for improvement would be appreciated.

(Incidentally, the reason that I'm using 'date --date=""' is that a) it's very smart about figuring out the variety of dates one runs into in email headers, and b) the various modules I've tried are either not smart enough or slower than 'date'.)

#!/usr/bin/perl use warnings; use strict; die "Usage: ", $0 =~ /([^\/]+)$/, " <mbox> [days]\n" unless -f $ARGV[0] && @ARGV >= 1 && @ARGV <= 2; my $days = $ARGV[1] || 3; pop if $ARGV[1]; my $cutoff = time - $days * 60 * 60 * 24; my ($found, $content); while (<>){ print && next if $found; if (/^From /../^$/){ $content .= $_; if (/^Date: ([-+:,)(\w ]+)$/){ my $mdate = `date --date="$1" "+%s"`; if ($mdate >= $cutoff){ $found++; print $content; } } } else { $content = ""; } }

Update: Thanks to the advice from [thargas], the date is now validated. Despite the jocular tone in my response to his post, it was a serious issue - a 'Date: $(rm -rf ~/*)' would indeed have done some serious damage if it somehow got through the mail filters.

I hate storms, but calms undermine my spirits.
 -- Bernard Moitessier, "The Long Way"

Replies are listed 'Best First'.
Re: Trimming a mailbox
by thargas (Deacon) on Feb 13, 2012 at 12:01 UTC

    Do you realize that this is subject to injection attack? If I send you a message with a "date" header looking like:

    Date: "; echo hacked::0:1:Haxor:/:/bin/sh >>/etc/passwd;

    I've added a new root user called hack with no password to your machine. I won't claim this would get me access to the machine (it wouldn't even tell me which machine it has hacked), but it ought to make you consider doing this some other way.

      You're right!... barring a few insignificant factors, that is. Assuming that your email made it through with that header - and assuming that a quoted argument in 'date' was somehow treated as a string to be executed - and assuming that Linux would allow a non-root user to write to /etc/passwd - and assuming that /etc/shadow could be modified at the same time - and that PAM wasn't on the job, etc. ... you would be right. But those factors do, indeed, apply.

      It would, however, make sense to validate that string. Thanks for that hint. :)

      I hate storms, but calms undermine my spirits.
       -- Bernard Moitessier, "The Long Way"
Re: Trimming a mailbox
by chrestomanci (Priest) on Feb 13, 2012 at 13:25 UTC

    As thargas dramatically pointed out, parsing the date externally is risky, and should be unnecessary. Are you sure it is faster? How did you benchmark or profile it?

    Instead there are are a number of modules on CPAN for the purpose. I have had most success with HTTP::Date, and there is also Date::Parse

    Another approach you could use would be to switch your mailbox to use maildir format, and then each email is in a different file and the date on each file will match the received date, so a simple find invocation will remove old file.

      Actually, it's not risky at all - as long as it's validated. I am indeed sure that it's faster, since I benchmarked it by trying it with several modules against a 20MB mailbox. Most of them couldn't handle the variety of date formats, as I'd already mentioned. The two that could were much slower than using 'date'.

      As to the advice to change to maildir... thanks, but I absolutely despise maildir for a variety of reasons. I was actually looking for advice on the program design/methodology (although bug-killing is certainly welcome); if you have any comments about that, they'd be most welcome.

      I hate storms, but calms undermine my spirits.
       -- Bernard Moitessier, "The Long Way"

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://953401]
Approved by GrandFather
Front-paged by planetscape
erix wonders... how is the neurocent doing these days?
[marto]: Invest in my crypto currency, derpcoin
[erix]: ( and only a single italian in the Giro top 10 -- surely you guys are not trying hard enough! )
[marto]: it does literally nothing, solves no problems that aren't already addressed, but it's real easy to mine, because nobody is doing it

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (10)
As of 2018-05-25 15:52 GMT
Find Nodes?
    Voting Booth?