Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

HTML::FromText

by FoxtrotUniform (Prior)
on May 23, 2002 at 03:38 UTC ( #168646=modulereview: print w/ replies, xml ) Need Help??

Item Description: Mark up text as HTML

Review Synopsis: HTML::FromText is a clever and handy module,but it lacks flexibility.

Summary:

HTML::FromText is a clever and handy module, but it lacks flexibility.

Author:

Gareth D. Rees.

Description:

HTML::FromText takes ascii text and marks it up into web-friendly HTML based on a few fairly well-defined conventions, including the familiar *bold* and _underline_ indicators you'll recognize from Usenet and email. It'll also escape HTML metacharacters, preserve whitespace when sensical, and even recognize and mark up tables. This is excellent news for those of us who absolutely despise writing raw HTML, but are too macho to use WYSIWYG editors.

Using HTML::FromText couldn't be simpler: there's only one function you care about, and it has a nice clean interface:

my $html_code = text2html($text, %args);

For a full description of the available options, I'll refer you to the excellent HTML::FromText documentation, but here are some examples of flags you can set (or clear) in %args:

  • paras: treat text as paragraph-oriented
  • blockcode: mark up indented paragraphs as code
  • bullets: mark up bulleted paragraphs as an unordered list
  • tables: guess/recognize tables in the text and mark them up in proper HTML

Better yet, HTML::FromText produces remarkably clean, if not exactly elegant, HTML.

What could be better? Well....

HTML::FromText isn't particularly flexible. Want to mark up _foo_ as <i>foo</i>? Too bad, you're locked into underline tags. Jealous of Perl Monks' [link] convention? Sorry, you can't easily build one on top of HTML::FromText -- it'll only do two balanced delimiters. Want to treat some indented paragraphs as code and others as quotes in the same document? No can do, it's a global option.

Since there are only a few different behaviours (mark up *text*, mark up _text_, mark up indented paragraphs, mark up lists, mark up tables), I'd have liked to see a callback-oriented interface, with the existing behaviours as defaults. That way, I could replace the underscore callback with my own (to do italicised text), rewrite the headings and titles callbacks to use somewhat less exuberant header tags, and hack together a smarter callback for indented paragraphs that tries to guess whether the text in question is code or a quote, and do the right thing.

This wouldn't be so big a deal if you could escape raw HTML, but you can't. It's either convert all HTML metacharacters to their corresponding entities, or none. This is an obnoxious omission: I like to be able to keep <s and >s in my text without having to escape them by hand, but if HTML::FromText isn't going to do it for me, I'd like to be able to italicize text, too.

And would it really be so hard to recognize a line of more than, say, ten hyphens as a <hr>? That would really come in handy.

I've spent quite a bit of text ranting about its shortcomings, but I really like HTML::FromText. It's a godsend to those of us who hate HTML, but love our text editors.

Ups:

  • Does an excellent job of translating text to HTML
  • Table recognition is intuitive and seamless
  • Good documentation
  • Clean interface

Downs:

  • Doesn't support some fairly obvious markups
  • Inflexible

Conclusion:

Excellent for translating simple documents to HTML. Don't use it to write your website, since it only does bare-bones markup. On second thought: do use it to write your website, since you'll end up with far less bloat.:-)

Update: Added author information. Thanks Juerd!

Comment on HTML::FromText
Select or Download Code
Re: HTML::FromText
by boo_radley (Parson) on May 23, 2002 at 13:49 UTC
    Thanks for a good review, this looks like a nifty and useful module.

    I'm curious if you've discussed some of your dislikes with the module author? Most module authors I've talked to have welcomed patches and updates. It seems like the hyphens-to-hr (or equal sign) conversion would be a simple enough patch, as well as one to handle /italicized/ text. And perhaps using another character sequence to signal "do not escape html entities within"? Although it seems like the more you add to 'plain-text' processing, the more it becomes its own type of markup :)
        I'm curious if you've discussed some of your dislikes with the module author? Most module authors I've talked to have welcomed patches and updates.

      I'm waiting until I have a couple of patches coded, tested, and diff(1)ed before getting in touch with the author. I figure that I'm more likely to get my way if I've already done the work. :-)

      --
      The hell with paco, vote for Erudil!
      :wq

Thank you for posting this!
by Marza (Vicar) on May 23, 2002 at 23:01 UTC

    I was going bonkers trying to format a large notification email that had links, paragraphs, etc.

    Writing everything to a file worked, but when you loaded to email; alignment was lost.

    Played around with the mod and your doc helped understand it.

    Now the email looks like I want it!

    Thanks again!

Re: HTML::FromText
by FoxtrotUniform (Prior) on Sep 09, 2002 at 17:19 UTC

    I have a patch for HTML::FromText that solves most of the problems I griped about in this review. It's on its way to the author just as soon as I get a bit of code review.

    --
    F o x t r o t U n i f o r m
    Found a typo in this node? /msg me
    The hell with paco, vote for Erudil!

Back to Reviews

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: modulereview [id://168646]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2014-04-19 06:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (478 votes), past polls