I have been using Netscape or Mozilla as my mail client since 1996, and I have built up a message archive with thousands of messages.
The way I have managed this archive, up to now, is to create folders within the Mozilla mail client. I then use the "Search Mail/News Messages" function in the mail client to find specific messages. This is not scaling well. Searches take a long time because each folder is stored as two text files:
- one file is the messages themselves concatenated together,
- the second is a file of meta data corresponding to the messages in the other file.
I want to design an application that ingests my Mozilla mailbox, separates the messages into rows in a database, and provides much more robust and scalable search capabilities. I am considering using MySql as the database with Apache and mod_perl as the front end running on my local machine, a Linux laptop.
I am not asking for help identifying the Perl modules to parse mail out of a Mozilla mailbox. I think this was covered in a previous question I posted, Netscape/Mozilla Mailbox Processing. But, I do wonder if my fellow monks would mind commenting on:
- the general merits of the design idea that I've sketched out
- any "gotchas" they see in attempting to store email in a MySQL database, or rendering the body of the message in a dynamically generated web page
- practical ways to deal with any attachments included with the mails:
- copy to a place in the file system, store a reference to the location in the database
- embed the attachments as BLOBs in the database
Finally, if anyone knows of an Open Source program that provides 80 percent of this functionality, let me know. So far, I've identified
SQmaiL (python) and
Gmail (C). Neither is Perl, nor do they seem to be particularly active projects.
Thanks,
Dave Aiello
Chatham Township Data Corporation