Item Description: Mark up text as HTML

Review Synopsis: HTML::FromText is a clever and handy module,but it lacks flexibility.


HTML::FromText is a clever and handy module, but it lacks flexibility.


Gareth D. Rees.


HTML::FromText takes ascii text and marks it up into web-friendly HTML based on a few fairly well-defined conventions, including the familiar *bold* and _underline_ indicators you'll recognize from Usenet and email. It'll also escape HTML metacharacters, preserve whitespace when sensical, and even recognize and mark up tables. This is excellent news for those of us who absolutely despise writing raw HTML, but are too macho to use WYSIWYG editors.

Using HTML::FromText couldn't be simpler: there's only one function you care about, and it has a nice clean interface:

my $html_code = text2html($text, %args);

For a full description of the available options, I'll refer you to the excellent HTML::FromText documentation, but here are some examples of flags you can set (or clear) in %args:

Better yet, HTML::FromText produces remarkably clean, if not exactly elegant, HTML.

What could be better? Well....

HTML::FromText isn't particularly flexible. Want to mark up _foo_ as <i>foo</i>? Too bad, you're locked into underline tags. Jealous of Perl Monks' [link] convention? Sorry, you can't easily build one on top of HTML::FromText -- it'll only do two balanced delimiters. Want to treat some indented paragraphs as code and others as quotes in the same document? No can do, it's a global option.

Since there are only a few different behaviours (mark up *text*, mark up _text_, mark up indented paragraphs, mark up lists, mark up tables), I'd have liked to see a callback-oriented interface, with the existing behaviours as defaults. That way, I could replace the underscore callback with my own (to do italicised text), rewrite the headings and titles callbacks to use somewhat less exuberant header tags, and hack together a smarter callback for indented paragraphs that tries to guess whether the text in question is code or a quote, and do the right thing.

This wouldn't be so big a deal if you could escape raw HTML, but you can't. It's either convert all HTML metacharacters to their corresponding entities, or none. This is an obnoxious omission: I like to be able to keep <s and >s in my text without having to escape them by hand, but if HTML::FromText isn't going to do it for me, I'd like to be able to italicize text, too.

And would it really be so hard to recognize a line of more than, say, ten hyphens as a <hr>? That would really come in handy.

I've spent quite a bit of text ranting about its shortcomings, but I really like HTML::FromText. It's a godsend to those of us who hate HTML, but love our text editors.




Excellent for translating simple documents to HTML. Don't use it to write your website, since it only does bare-bones markup. On second thought: do use it to write your website, since you'll end up with far less bloat.:-)

Update: Added author information. Thanks Juerd!

Replies are listed 'Best First'.
Re: HTML::FromText
by boo_radley (Parson) on May 23, 2002 at 13:49 UTC
    Thanks for a good review, this looks like a nifty and useful module.

    I'm curious if you've discussed some of your dislikes with the module author? Most module authors I've talked to have welcomed patches and updates. It seems like the hyphens-to-hr (or equal sign) conversion would be a simple enough patch, as well as one to handle /italicized/ text. And perhaps using another character sequence to signal "do not escape html entities within"? Although it seems like the more you add to 'plain-text' processing, the more it becomes its own type of markup :)
        I'm curious if you've discussed some of your dislikes with the module author? Most module authors I've talked to have welcomed patches and updates.

      I'm waiting until I have a couple of patches coded, tested, and diff(1)ed before getting in touch with the author. I figure that I'm more likely to get my way if I've already done the work. :-)

      The hell with paco, vote for Erudil!

Thank you for posting this!
by Marza (Vicar) on May 23, 2002 at 23:01 UTC

    I was going bonkers trying to format a large notification email that had links, paragraphs, etc.

    Writing everything to a file worked, but when you loaded to email; alignment was lost.

    Played around with the mod and your doc helped understand it.

    Now the email looks like I want it!

    Thanks again!

Re: HTML::FromText
by FoxtrotUniform (Prior) on Sep 09, 2002 at 17:19 UTC

    I have a patch for HTML::FromText that solves most of the problems I griped about in this review. It's on its way to the author just as soon as I get a bit of code review.

    F o x t r o t U n i f o r m
    Found a typo in this node? /msg me
    The hell with paco, vote for Erudil!