Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

PineTOC

by Tally (Novice)
on Aug 01, 2000 at 00:05 UTC ( #25328=sourcecode: print w/replies, xml ) Need Help??
Category: Text Processing
Author/Contact Info Tim Lewis (LewisT@UAH.EDU)
Description: PINE is a common text-based email viewer on many UNIX systems. The PINE program stores email in large text files which makes it very handy to archive your old email... except that there's no table of contents at the beginning of the file to let you know what messages are stored there. This script solves that problem by parsing the PINE email store and creating a separate table of contents from the headers of each email. The resulting TOC lists the message number, title, sender info and date in formatted columns. I usually concatinate the TOC and email storage file, and then save the resulting file in my email archives.

Note: This script works very well with version 3.96 of PINE, which I use, but there are other versions that I have not tested it on.

PLEASE comment on this code. I'm a fairly new perl programmer and would appreciate feedback on how to improve my programming.
#!/usr/bin/perl

use warnings;
use strict;

if (!$ARGV[0]) {
    print "Usage: pinetoc inputfile outputfile\n";
    die;
}
open (INFILE, "<$ARGV[0]") or die "Could not open input file!\n";
open (OUTFILE, ">$ARGV[1]") or die "Could not open input file!\n";

##### Variables #####
my $From = "";        # Used to store the from address
my $Subject = "";        # Used to store the subject
my $Date = "";        # Used to store the message date
my $LetterNum = 0;        # Counts the number of emails
my $HeaderFlag = -1;        # Flag is < 0 when we're searching for a n
+ew email
            # Flag is > 0 and < 7 when we're getting header info
            # Flag is > 6 when we've found all the header info

##### Main Loop #####
while (<INFILE>){

    # Look for a new message (all messages have a header line beginnin
+g "X-UIDL: ")
    if (/^X-UIDL: \w{32}/) {

        if ($HeaderFlag > 0) {
            # We haven't got all the header info yet... but we'll writ
+e anyway
            &WriteTOCline ($LetterNum, $From, $Subject, $Date, $Header
+Flag);
        }

        $LetterNum++;

        # Clear the message data variables
        $HeaderFlag = 0;
        $From = "";
        $Subject = "";
        $Date = "";
    }
    if ($HeaderFlag < 0) {
        # Do nothing -- already found the header info, so we're search
+ing for a new letter
    }
    elsif ($HeaderFlag < 7) {
        if ($_ =~ "^From:") {
            s/(From: |"|(\[|<)[^\]>](\]|>)|\n)//g;    # remove a bunch
+a stuff to isolate the name
            s/^\s*|\s*$//g;                # remove leading or trailin
+g whitespace
            $From = $_;
            $HeaderFlag += 1;
        }
        elsif ($_ =~ "^Subject:") {
            s/Subject:|\n//g;            # remove stuff to isolate the
+ subject
            s/^\s*|\s*$//g;                # remove leading or trailin
+g whitespace
            $Subject = $_;
            if ($Subject eq "") {
                $Subject = "(Blank subject)";
            }
            $HeaderFlag += 2;
        }
        elsif ($_ =~ "^Date:") {
            ($Date) = ($_ =~ /Date: (\w+, \w+ \w+ \w+)/);
            $HeaderFlag += 4;
        }
    }
    else {
        # We've got all the header info
        &WriteTOCline ($LetterNum, $From, $Subject, $Date, $HeaderFlag
+);
        $HeaderFlag = -1;
    }
}

close INFILE;
close OUTFILE;
exit 0;

##### Subroutine for writing the TOC #####
sub WriteTOCline {
    my($LetterNum, $From, $Subject, $Date, $HeaderFlag) = @_;
    my @Error = ("","From", "Subject", "", "Date");

    my $ErrorNum = $HeaderFlag ^ 7;

    if ($ErrorNum > 7) {
        print "Error: Too much header info in letter $LetterNum titled
+ '$Subject'\n";
    }
    elsif ($ErrorNum >0) {
        print "Error: Missing '$Error[$ErrorNum]' field in message $Le
+tterNum\n";
    }
    
    # Write to output file (all cases)
    printf OUTFILE "%-4d  %-30.30s  %-20.20s  %-16.16s\n", $LetterNum,
+ $Subject, $From, $Date or die "Could not write to output file!\n";
}    
Replies are listed 'Best First'.
(jjhorner)PineTOC
by jjhorner (Hermit) on Aug 01, 2000 at 04:35 UTC

    Pretty good code, from just a quick peek, but even though you are declaring your variables, you aren't checking up on yourself with the warnings and strict pragmas.

    Please use them. Even experienced Perl programmers use them.

    "-w" (or "use warnings") and "use strict" are your friends.

    cut-n-paste the following code and run it as your punishment.

    #!/usr/bin/perl -w use strict; my $i; for($i = 0; $i < 100; $i++) { print "I will use strict and warnings.\n"; };
    J. J. Horner
    Linux, Perl, Apache, Stronghold, Unix
    jhorner@knoxlug.org http://www.knoxlug.org/
    
      Thanks for your input!
      I updated my code, and ran my penance program like a good monk. =)
RE: PineTOC
by splinky (Hermit) on Aug 01, 2000 at 08:28 UTC
    First off, not a bad bit of code. I notice that you're checking the returns from your opens, which is a very good thing.

    I can't help but wonder why, in WriteTOCline, you take the two-step approach of sprintf followed by print instead of just using printf.

    And now, a few more Perlish ways to do a few things:

    You can shorten if ($_ =~ /^X-UIDL: \w{32}/) { to if (/^X-UIDL: \w{32}/) {. The $_ is implied on matches unless another variable is explicitly used.

    All instances of $Variable = $Variable + n can be shortened to $Variable += n with no loss of readability to anyone who knows Perl (or C, for that matter).

    Probably the biggest change you could make, and one which would be very educational for you, would be to read RFC 822, which defines the format of email messages, and use that knowledge to set $/ to something useful so that you could slurp up whole messages at a time instead of reading them one line at a time.

    Finally, I'll rain on your parade a bit by telling you that you're reinventing the wheel. If you want the semi-official Perl package for handling email, have a look at Graham Barr's MailTools bundle.

    *Woof*

      Thanks for your input. I updated my code based on your comments.

      I originally used the "sprintf" followed by "print" because I didn't know the "printf" command would take formatting. Thanks for pointing this out!

      I've read parts of RFC822 in the past, but I'm not sure how relavant it would be to this situation. PINE stores its messages with all the RFC headers, true.. but is the PINE message store totally 822 compliant? Maybe it is, but I assume that PINE probably changes the formatting of the messages and headers slightly when it stores them. Certainly, the messages in the store don't end in a single period on a line by itself (the signal for the end of an SMTP email). Still, I'm sure you're right in saying that there is a more efficient way to "slurp up whole messages".

      Thanks for the reference to MailTools. I'll take a look.

      Tally
Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://25328]
help
Chatterbox?
[Your Mother]: What a stupid, yet seductive super power. Control of biting insects.
[oiskuu]: Don't you think it's disturbing that the washing machine manual would say not to start it with pets inside?
[oiskuu]: So, how long did it take you to grow up?
[oiskuu]: Have you ever had your temperature taken from the other end?
[marto]: I read about a case where people were at a launderette, or whatever they're called in the US, and as a joke, put their toddler in with the washing, then closed the door
[Your Mother]: (Work in progress.)
[marto]: unaware that the model in question started the cycle automatically, provided the money/token had already been provided

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (12)
As of 2017-12-18 14:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (487 votes). Check out past polls.

    Notices?