Item Description: Mark up text as HTML
Review Synopsis: HTML::FromText is a clever and handy module,but it lacks flexibility.
HTML::FromText is a clever and handy module, but it lacks flexibility.
HTML::FromText takes ascii text and marks it up into web-friendly HTML based on a few fairly well-defined conventions, including the familiar *bold* and _underline_ indicators you'll recognize from Usenet and email. It'll also escape HTML metacharacters, preserve whitespace when sensical, and even recognize and mark up tables. This is excellent news for those of us who absolutely despise writing raw HTML, but are too macho to use WYSIWYG editors.
Using HTML::FromText couldn't be simpler: there's only one function you care about, and it has a nice clean interface:
my $html_code = text2html($text, %args);
For a full description of the available options, I'll refer you to the excellent HTML::FromText documentation, but here are some examples of flags you can set (or clear) in %args:
- paras: treat text as paragraph-oriented
- blockcode: mark up indented paragraphs as code
- bullets: mark up bulleted paragraphs as an unordered list
- tables: guess/recognize tables in the text and mark them up in proper HTML
Better yet, HTML::FromText produces remarkably clean, if not exactly elegant, HTML.
What could be better? Well....
HTML::FromText isn't particularly flexible. Want to mark up _foo_ as <i>foo</i>? Too bad, you're locked into underline tags. Jealous of Perl Monks' [link] convention? Sorry, you can't easily build one on top of HTML::FromText -- it'll only do two balanced delimiters. Want to treat some indented paragraphs as code and others as quotes in the same document? No can do, it's a global option.
Since there are only a few different behaviours (mark up *text*, mark up _text_, mark up indented paragraphs, mark up lists, mark up tables), I'd have liked to see a callback-oriented interface, with the existing behaviours as defaults. That way, I could replace the underscore callback with my own (to do italicised text), rewrite the headings and titles callbacks to use somewhat less exuberant header tags, and hack together a smarter callback for indented paragraphs that tries to guess whether the text in question is code or a quote, and do the right thing.
This wouldn't be so big a deal if you could escape raw HTML, but you can't. It's either convert all HTML metacharacters to their corresponding entities, or none. This is an obnoxious omission: I like to be able to keep <s and >s in my text without having to escape them by hand, but if HTML::FromText isn't going to do it for me, I'd like to be able to italicize text, too.
And would it really be so hard to recognize a line of more than, say, ten hyphens as a <hr>? That would really come in handy.
I've spent quite a bit of text ranting about its shortcomings, but I really like HTML::FromText. It's a godsend to those of us who hate HTML, but love our text editors.
- Does an excellent job of translating text to HTML
- Table recognition is intuitive and seamless
- Good documentation
- Clean interface
- Doesn't support some fairly obvious markups
Excellent for translating simple documents to HTML. Don't use it to write your website, since it only does bare-bones markup. On second thought: do use it to write your website, since you'll end up with far less bloat.:-)
Update: Added author information. Thanks Juerd!
|Replies are listed 'Best First'.|
by boo_radley (Parson) on May 23, 2002 at 13:49 UTC
Thank you for posting this!
by Marza (Vicar) on May 23, 2002 at 23:01 UTC
by FoxtrotUniform (Prior) on Sep 09, 2002 at 17:19 UTC