Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Ok here goes:

I'm in the process of writing a forum type web application. I accept messages from forms, save them in a database and then display them as web pages and/or email them to users.

The problem is I have two types of users. The first type their message in the text field and expect to see it displayed as they typed. The second want to pretty up their messages and so use HTML. Currently the HTML tags are limited to a small subset including b,i,p,br,a,ul and li

At the moment, the text is passed through HTML::Scrubber to limit the HTML tags & attributes (if any) and then stored in the db. When displayed on a webpage, the text is run through a simple regex which adds <p> and <br> tags in place of \n . The emailed msgs are sent out as plain text, with no additional filtering.

There are a least two problems with this approch however:

  • Those users who supply HTML tags, find that the regex conflicts with their supplied tags, adding extra <p> and <br> 's everywhere
  • Those users getting the messages via email get a bunch of HTML tags in the messages if the orginal poster used HTML.

So my thoughts on this was to start storing the data as HTML. I was thinking of accepting messages in both HTML and plain text format (adding a checkbox below the form, or maybe searching for HTML tags and deciding). The plain text messages would be passed through HTML::FromText, and then both would be Scrubbed as before.

On the output side, when displayed as webpages, the data can be taken straight from the db without any processing, while for emailing I was looking at using HTML::FormatText to convert back into plain text.

I've started to code up some examples to test this out and it _almost_ works. The issue is that I'd like to have the output text match as closely as possible to the input text, else I will get complaints :) There are a number of small problems like how HTML::FromText changes

* 1
* 2
to
<UL><LI><P>1</P>
<P>2</P>
</UL>
which HTML::FormatText renders as:
  *
 
    1
 
  *
 
    2
To fix these this I've started making small modifications to both HTML::FromText and HTML::FormatText. So one of my quesitons is should I submit these as patches to the authors or should I just fork and change them to MyApp::HTML::xxxx

And finally while typing this I've thought of maybe adding an attribute in the db to indicate whether or not the text is in html form. This will get rid of the converting back and forth. Thinking about this now it might be the best way to do it.

Am I going about this the right way? Someone must have done something simliar to this before and I'm interested in your comments


In reply to Converting plain text to HTML and back again by Nomis52

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others scrutinizing the Monastery: (8)
    As of 2014-10-02 07:57 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      What is your favourite meta-syntactic variable name?














      Results (50 votes), past polls