Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??


HTML::FromText is a clever and handy module, but it lacks flexibility.


Gareth D. Rees.


HTML::FromText takes ascii text and marks it up into web-friendly HTML based on a few fairly well-defined conventions, including the familiar *bold* and _underline_ indicators you'll recognize from Usenet and email. It'll also escape HTML metacharacters, preserve whitespace when sensical, and even recognize and mark up tables. This is excellent news for those of us who absolutely despise writing raw HTML, but are too macho to use WYSIWYG editors.

Using HTML::FromText couldn't be simpler: there's only one function you care about, and it has a nice clean interface:

my $html_code = text2html($text, %args);

For a full description of the available options, I'll refer you to the excellent HTML::FromText documentation, but here are some examples of flags you can set (or clear) in %args:

  • paras: treat text as paragraph-oriented
  • blockcode: mark up indented paragraphs as code
  • bullets: mark up bulleted paragraphs as an unordered list
  • tables: guess/recognize tables in the text and mark them up in proper HTML

Better yet, HTML::FromText produces remarkably clean, if not exactly elegant, HTML.

What could be better? Well....

HTML::FromText isn't particularly flexible. Want to mark up _foo_ as <i>foo</i>? Too bad, you're locked into underline tags. Jealous of Perl Monks' [link] convention? Sorry, you can't easily build one on top of HTML::FromText -- it'll only do two balanced delimiters. Want to treat some indented paragraphs as code and others as quotes in the same document? No can do, it's a global option.

Since there are only a few different behaviours (mark up *text*, mark up _text_, mark up indented paragraphs, mark up lists, mark up tables), I'd have liked to see a callback-oriented interface, with the existing behaviours as defaults. That way, I could replace the underscore callback with my own (to do italicised text), rewrite the headings and titles callbacks to use somewhat less exuberant header tags, and hack together a smarter callback for indented paragraphs that tries to guess whether the text in question is code or a quote, and do the right thing.

This wouldn't be so big a deal if you could escape raw HTML, but you can't. It's either convert all HTML metacharacters to their corresponding entities, or none. This is an obnoxious omission: I like to be able to keep <s and >s in my text without having to escape them by hand, but if HTML::FromText isn't going to do it for me, I'd like to be able to italicize text, too.

And would it really be so hard to recognize a line of more than, say, ten hyphens as a <hr>? That would really come in handy.

I've spent quite a bit of text ranting about its shortcomings, but I really like HTML::FromText. It's a godsend to those of us who hate HTML, but love our text editors.


  • Does an excellent job of translating text to HTML
  • Table recognition is intuitive and seamless
  • Good documentation
  • Clean interface


  • Doesn't support some fairly obvious markups
  • Inflexible


Excellent for translating simple documents to HTML. Don't use it to write your website, since it only does bare-bones markup. On second thought: do use it to write your website, since you'll end up with far less bloat.:-)

Update: Added author information. Thanks Juerd!

In reply to HTML::FromText by FoxtrotUniform

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2023-10-04 09:11 GMT
Find Nodes?
    Voting Booth?

    No recent polls found