Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??


HTML::FromText is a clever and handy module, but it lacks flexibility.


Gareth D. Rees.


HTML::FromText takes ascii text and marks it up into web-friendly HTML based on a few fairly well-defined conventions, including the familiar *bold* and _underline_ indicators you'll recognize from Usenet and email. It'll also escape HTML metacharacters, preserve whitespace when sensical, and even recognize and mark up tables. This is excellent news for those of us who absolutely despise writing raw HTML, but are too macho to use WYSIWYG editors.

Using HTML::FromText couldn't be simpler: there's only one function you care about, and it has a nice clean interface:

my $html_code = text2html($text, %args);

For a full description of the available options, I'll refer you to the excellent HTML::FromText documentation, but here are some examples of flags you can set (or clear) in %args:

  • paras: treat text as paragraph-oriented
  • blockcode: mark up indented paragraphs as code
  • bullets: mark up bulleted paragraphs as an unordered list
  • tables: guess/recognize tables in the text and mark them up in proper HTML

Better yet, HTML::FromText produces remarkably clean, if not exactly elegant, HTML.

What could be better? Well....

HTML::FromText isn't particularly flexible. Want to mark up _foo_ as <i>foo</i>? Too bad, you're locked into underline tags. Jealous of Perl Monks' [link] convention? Sorry, you can't easily build one on top of HTML::FromText -- it'll only do two balanced delimiters. Want to treat some indented paragraphs as code and others as quotes in the same document? No can do, it's a global option.

Since there are only a few different behaviours (mark up *text*, mark up _text_, mark up indented paragraphs, mark up lists, mark up tables), I'd have liked to see a callback-oriented interface, with the existing behaviours as defaults. That way, I could replace the underscore callback with my own (to do italicised text), rewrite the headings and titles callbacks to use somewhat less exuberant header tags, and hack together a smarter callback for indented paragraphs that tries to guess whether the text in question is code or a quote, and do the right thing.

This wouldn't be so big a deal if you could escape raw HTML, but you can't. It's either convert all HTML metacharacters to their corresponding entities, or none. This is an obnoxious omission: I like to be able to keep <s and >s in my text without having to escape them by hand, but if HTML::FromText isn't going to do it for me, I'd like to be able to italicize text, too.

And would it really be so hard to recognize a line of more than, say, ten hyphens as a <hr>? That would really come in handy.

I've spent quite a bit of text ranting about its shortcomings, but I really like HTML::FromText. It's a godsend to those of us who hate HTML, but love our text editors.


  • Does an excellent job of translating text to HTML
  • Table recognition is intuitive and seamless
  • Good documentation
  • Clean interface


  • Doesn't support some fairly obvious markups
  • Inflexible


Excellent for translating simple documents to HTML. Don't use it to write your website, since it only does bare-bones markup. On second thought: do use it to write your website, since you'll end up with far less bloat.:-)

Update: Added author information. Thanks Juerd!

In reply to HTML::FromText by FoxtrotUniform

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others pondering the Monastery: (4)
    As of 2020-02-19 03:57 GMT
    Find Nodes?
      Voting Booth?
      What numbers are you going to focus on primarily in 2020?

      Results (80 votes). Check out past polls.