HTML Templating as Tree Rewriting: Part I: "If Statements"

Until Sean Burke's articles, it never really occurred to me that HTML could be represented and understood as a tree. For example, given this HTML:


   
<html>
<head>

<title>Doc 1</title>

</head>

  <body>
     Stuff 
     <hr> 
     2000-08-17
  </body>

</html>
[download]

the following tree results:

             html
             /      \
         head        body
        /          /   |  \
     title    "Stuff"  hr  "2000-08-17"
       |
    "Doc 1"
[download]

This slide provides another example of representing an HTML document as a tree.

The popular Perl HTML templating systems do not treat HTML manipulation as tree manipulation. At least not directly, because it may be the case that all programs and data structures can be represented as a tree (correct me on this). The popular Perl systems treat HTML as a character string and provide simple pseudo-operators to manipulate the display logic of this string.

While this is intuitive for programmers and designers alike, it is instructive to look at radically different approaches. In this article, I move through a number of common pseudo operators and HTML manipulations and show how each of these can be interpreted as a tree rewriting operation. Because Template is so well-documented and provides a representative feature set, it is easy to use for this purpose.

if

The iftag of the pseudo language decides whether a node of the tree will remain or not:

[% IF age < 10 %]
       Hello, does your mother know you're 
       using her AOL account?
    [% ELSIF age < 18 %]
       Sorry, you're not old enough to enter 
       (and too dumb to lie about your age)
    [% ELSE %]
       Welcome
    [% END %]
[download]

In the template HTML, we start with three candidate nodes:


ROOT
  child1:  Hello, does your mother know you're ...
  child2:  Sorry, you're not old enough to enter ...
  child3:  Welcome
[download]

And based on the conditional, we delete or preserve the child nodes. Now, I have looked at number of practical solutions for implementing this tree op in Perl, and after looking at

XML::Smart (promising, innovative, but in active development and young)
XML::XSH (powerful, professional and does not build completely on Cygwin)
XML::LibXML (very nice, but I have a gap: I can search using XML::XPath, but dont see how to integrate search results with tree processing via LibXML

I decided to use old faithful, HTML::Tree, to provide examples (and build the next generation of HTML::Seamstress).

So, here is how we handle this task using HTML tree rewrites. First we markup the HTML so we can find it:

<span id=age_handler>
  <span id="under10">
       Hello, does your mother know you're 
       using her AOL account?
  </span>
  <span id="under18">
       Sorry, you're not old enough to enter 
       (and too dumb to lie about your age)
  </span>
  <span id="welcome">
       Welcome
  </span>
</span>
[download]

And now we process it using HTML::Tree,

use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new();
$tree->parse_file($filename);
$tree->age_handler($age);
print $tree->as_HTML;

sub age_handler {
   my ($tree, $age) = @_;
   my $SPAN = $tree->look_down('id', 'age_handler');
   if ($age < 10) {
    $SPAN->look_down('id', $_)->detach for qw(under18 welcome);
   } elsif ($age < 18) {
    $SPAN->look_down('id', $_)->detach for qw(under10 welcome);
  } else {
        $SPAN->look_down('id', $_)->detach for qw(under10 under18);
  }

}
[download]

Hmm, I'm worn out. Let's make this the first installation in the ongoing saga entitled: how to do HTML templating via tree rewrites: the HTML::Seamstress approach.

Just one more comment: all of that

look_down->detach() for
    ($this, $that)
[download]

should definitely be abstracted into some HTML::Stitchery such as:

$SPAN->KILL_CHILDREN (@children); # fodder for carnivore :)
[download]

Resources

There are other systems on CPAN which are tree-oriented. My system, HTML::Seamstress grew out of Paul Lucas' HTML_Tree by way of Evoscript, all of which was inspired by the Java XMLC framework. XMLC compiles a webpage into a java tree with API hooks for the various tags in the HTML. After you do tree rewriting on the little XML objects in the java tree, the build method builds the HTML page.

Petal is Perl's implementation of ZOPE's TAL This framework does quite a bit --- too much for me to want to figure out. And at times I felt like I was using Text::MagicTemplate because I had to know quite a bit about what to do on the HTML side to get my Perl data to enter the XML properly. All I want to do on the HTML side is put little id attributes in the HTML, find' em and rewrite 'em.

Xelig is also inspired by XMLC, but it is quite different from Seamstress. It is interesting but not so well-documented at the moment.

Comment on HTML Templating as Tree Rewriting: Part I: "If Statements" Select or Download Code

Replies are listed 'Best First'.
Re: HTML Templating as Tree Rewriting: Part I: "If Statements" by pg (Canon) on Oct 28, 2003 at 03:30 UTC
Actually, by looking at couple of scripts that use CGI module, you shall be able to quickly realize that HTML doc is a tree structure. Anything that can be visualized as small boxes in a box, and smaller boxes in small boxes... has a tree structure.	[reply]
Re: Re: HTML Templating as Tree Rewriting: Part I: "If Statements" by theorbtwo (Prior) on Oct 28, 2003 at 13:03 UTC
...if and only if the edges of those boxes never cross, that is. Only correct HTML can be visualized as a tree. (HTML::TreeBuilder corrects this type of error as it goes. Presumably most HTML modules do.) Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).	[reply]
Re: Re: Re: HTML Templating as Tree Rewriting: Part I: "If Statements" by bart (Canon) on Oct 28, 2003 at 22:47 UTC
That's one of the (very few) pros on building HTML with `CGI.pm`'s tag functions: it ensures your tags are nested properly. Just like they always should be.	[reply]
Re: HTML Templating as Tree Rewriting: Part I: "If Statements" by dakkar (Hermit) on Oct 28, 2003 at 19:06 UTC
SGML (of which HTML is an application) has always been a way to serialize trees. XML made it clearer (by simplifying the syntax), but it's nothing new. Most templating systems do treat HTML as text, but not all of them. I use AxKit, a web framework based on XML and XSLT XSLT is a language specifically designed for tree manipulations. Not exactly every tree manipulation: it is more oriented towards down-translations (from structure to presentation), but it is Turing-complete, and I've used it successfully in a number of occasions. It is not compact, it is not "quick & dirty", but I find it to be the easiest way (for me) to manipulate XML data. You can use XML::LibXML to parse HTML documents, and XML::LibXSLT to transform them. And (referring to a previous post) some Web graphics/designers are starting to learn XSLT, since they can use it independently from the language used by the developer. -- dakkar - Mobilis in mobile Most of my code is tested... Perl is strongly typed, it just has very few types (Dan)	[reply]
Re: HTML Templating as Tree Rewriting: Part I: "If Statements" by cbraga (Pilgrim) on Oct 28, 2003 at 17:46 UTC
I've seen Sean Burke's articles and up until now this seems to me to increase the work/time necessary to produce the templates, site and code in relation to Template::Toolkit, for instance. Not to mention the snowball's chance in hell of a webdesigner actually learning to use that system. Please tell me I'm wrong, and show how easy and simple to use that method can be. `ESC[78;89;13p ESC[110;121;13p`	[reply]
Re: Re: HTML Templating as Tree Rewriting: Part I: "If Statements" by princepawn (Parson) on Oct 28, 2003 at 18:58 UTC
I've seen Sean Burke's articles and up until now this seems to me to increase the work/time necessary to produce the templates, site and code in relation to Template::Toolkit, for instance. I guess for me what I like are standard technologies that I already know. I know Perl (to some extent). I know HTML. I don't savor learning the intricacies, exceptions, and shortcomings of pseudo-languages. For example, did you know that Template Toolkit only does string comparisons and not numeric comparisons. Regarding template production, my original post showed how little work needs to be done to the HTML: you just pop little `id` tags in wherever you want to do some sort of tree rewrite. Then you just code up a little TreeBuilder/Seamstress to do the tree rewrite and thou art finished. Not to mention the snowball's chance in hell of a webdesigner actually learning to use that system. Dynamic HTML is the programmer's responsibility. Static HTML is the webdesigner's reponsibility. He would have nothing to learn vis-a-vis Seamstress. Please tell me I'm wrong, and show how easy and simple to use that method can be. Well, the simplicity is in the fact that you have 2 well-known rock-solid technologies (Perl and HTML) and nothing else. I will post today a Hello world program. DBSchema::Sample	[reply] [d/l]