|Perl: the Markov chain saw|
Converting plain text to HTML and back againby Nomis52 (Friar)
|on Jul 08, 2003 at 04:32 UTC||Need Help??|
Nomis52 has asked for the wisdom of the Perl Monks concerning the following question:
Ok here goes:
I'm in the process of writing a forum type web application. I accept messages from forms, save them in a database and then display them as web pages and/or email them to users.
The problem is I have two types of users. The first type their message in the text field and expect to see it displayed as they typed. The second want to pretty up their messages and so use HTML. Currently the HTML tags are limited to a small subset including b,i,p,br,a,ul and li
At the moment, the text is passed through HTML::Scrubber to limit the HTML tags & attributes (if any) and then stored in the db. When displayed on a webpage, the text is run through a simple regex which adds <p> and <br> tags in place of \n . The emailed msgs are sent out as plain text, with no additional filtering.
There are a least two problems with this approch however:
So my thoughts on this was to start storing the data as HTML. I was thinking of accepting messages in both HTML and plain text format (adding a checkbox below the form, or maybe searching for HTML tags and deciding). The plain text messages would be passed through HTML::FromText, and then both would be Scrubbed as before.
On the output side, when displayed as webpages, the data can be taken straight from the db without any processing, while for emailing I was looking at using HTML::FormatText to convert back into plain text.
I've started to code up some examples to test this out and it _almost_ works. The issue is that I'd like to have the output text match as closely as possible to the input text, else I will get complaints :) There are a number of small problems like how HTML::FromText changes
* 1 * 2to
<UL><LI><P>1</P> <P>2</P> </UL>which HTML::FormatText renders as:
* 1 * 2To fix these this I've started making small modifications to both HTML::FromText and HTML::FormatText. So one of my quesitons is should I submit these as patches to the authors or should I just fork and change them to MyApp::HTML::xxxx
And finally while typing this I've thought of maybe adding an attribute in the db to indicate whether or not the text is in html form. This will get rid of the converting back and forth. Thinking about this now it might be the best way to do it.
Am I going about this the right way? Someone must have done something simliar to this before and I'm interested in your comments