Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

This is a long read, and may or may not be interesting. But I figured I'd write it up anyway since someone else may find it interesting or informative.

4 1/2 years ago I bought the domain for my wife as a birthday present. I then proceeded to let it rot for another 6 months before I finally did anything with it.

I had "homepages" set up back when I was in college, because that was the hip thing to do back then. Alas, I was also a putz in college and my programming exposure was pascal with a little C++. I had minimal graphics skills, nominal javascript, and a passable amount of HTML. In short, my pages sucked. There was a fair bit of content (a lot of it cobbled together off of the rest of the internet), but managing it was a bitch.

This time, this time would be different. I was a professional perl programmer, I'd built all sorts of big, interactive websites and apps, I knew databases. I could build an awesome, dynamic website with easy to manage and update content. It'd rock. Chicks would flock to me if the domain didn't make it blatantly obvious I was married.

Naturally, there are some technologies needed for an interactive website. I already had the nascent components of Basset formed, so of course I was going to use that. That was the M part of my MVC website, now I just needed the other two. For the controller layer, I just used simple CGI scripts, with the acknowledgement that I'd eventually move to something better (I finally did last year). For the view layer, I needed a templating system. I originally looked at and used HTML::Template, but it didn't work quite the way I wanted. I tried to go with HTML::Mason, but it looked like a big pain in the ass to install and run in a non-mod_perl environment. Besides, I'd gotten burned a few times by the volatile API back then, so I was wary.

Instead, in a fit of hubris and laziness, I decided to write my own system, which is what turned into Basset::Template. Right now it's fast and robust and extremely useable. This is that story, as best as I can remember it. I consider it a success story, but I wanted to re-tell it to give the budding template engineers out there an idea of what they're getting in to.

This project and its development history

The running joke in web circles is that sooner or later everybody writes their own templating system. The progression is normally along these lines:

  1. I need to embed variables. I'll use a special syntax like %%VARNAME%%. That's all I need to do.
  2. Oops, I sometimes need to vary the display content. It sucks to do this in my CGI script, so I'll add a conditional construct.
  3. Oops, building tables sucks, since I need to do it all in my CGI. I'll add in some simple looping constructs.
  4. Oops. My lexically scoped loop index variable clobbers the global. err, I'll add in a new syntax for it "%%%VARNAME%%%" for variables in a different scope.
  5. Oops. I need to embed a template. I'll add in a new INCLUDE directive.
  6. Oops. The preprocessor clobbers the scope of the embedded template variables so I'll add in a new directive ">>%VARNAME%<<" for variables specifically inside embedded templates.
  7. Repeat ad nauseum.

If you're lucky, you'll end up with something like Template::Toolkit at the end. If you're not, you'll end up with one of the myriad homebrew solutions littering so many websites. The ones where the dev staff grits their teeth when they talk about it and swear that they'll get rid of it someday, if they can, but rarely do because the cost is so high.

Instead of that route to madness, I took a different approach and simply opted to embed perl into my templates. I really liked Mason's syntax, but had already opted not to go with it. HTML::Template and Text::Template but they rubbed me the wrong way. Mason made sense, so I wanted something like that.

I initially waffled about it, since I didn't really want to go to the hassle of building a templating system and building a parser and all that other crap that goes into it. So I stuck with HTML::Template for a few months before I had my great flash of inspiration.

A template is just a program, but you quote the code, not the output.

For example, Template: <table> %% foreach my $row (@rows) { <tr> %% foreach my $cell (@$row) { <td><% $cell %></td> %% } </tr> %% } </table> CGI: print "<table>\n"; foreach my $row (@rows) { print "\t\t<tr>\n"; foreach my $cell (@$row) { print "\t\t\t<td>$cell</td>"; } print "\t\t</tr>\n"; } print "</table>\n";

So all you needed to do was reverse the quotes. Easy!

Well, not easy. There were a few additional caveats I had off the bat. Such as, some snippets of code needed to output a value, some just needed to execute code (such as a loop directive).

Okay, that was easy. I added two sets of syntax to it, <% $value %> (standard ASP or Mason or etc variable embed tags) and % at the start of a line to indicate something that needed to be executed, but had no output (similar to Mason's % at the start of a line, except mine allowed leading whitespace and Mason's didn't (at least not at the time, I hadn't kept up on it)). So all I needed to do was add an additional pre-processor to turn the <% % > tags into % tags to redirect to STDOUT.

That was keen, but it immediately introduced another issue - the template always output to STDOUT, I couldn't capture the output and spit it out somewhere. Yes, I could've redirected STDOUT to something and capture it, but that would've been a nuisance. I also wasn't sure if it would be error prone, maybe with threads or other output or somethingI also sure didn't want to require someone else to do the capture and redirect, so I needed a different approach.

I opted for a different filehandle, which I just called OUT. I'd then capture and redirect its output as desired.

Next issue I had was how to get variables into it. My templates needed data to display, after all. But...what's "into" the template? The template doesn't exist as executable code, so where should it live?

I didn't want to pull it into the calling package's namespace, because I wasn't sure if the variables would stomp on each other. In fact, that quickly expanded to not wanting to pull it into any calling package, for fear of clobbering something.

So I built up a complicated internal method to generate what I assumed to be a reasonably safe namespace for the template. It's multiple namespaces deep in the symbol table and uses the name of the file to construct it, with the hopes of having something unique that other templates couldn't clobber, or the user having managed to use himself.

And while I was at it, I used a similar approach to generate an arbitrary scalar value to toss the template data into. Now I didn't need to worry about the filehandle at all.

It was then just a simple matter of writing an importer to import passed variables into the new namespace of the template. All said, I think I reached this point in development after around a week or so. It was pretty powerful and fast enough, and only took a week. That's a quick ROI, and no doubt one of the reasons that homegrown systems are so popular. Why spend a week learning something when you can spend a week writing your own?

Next, just reverse the quotes. Go through and change all <% %> quoting to %% quoting, then flipflop all of those so the code was unquoted and the output was. It's fairly easy. That then gives you a big string with a program in it, living in a particular namespace. Just eval it and look at the magic scalar for the output and you're done.

Except it wasn't. There are all sorts of oddball little edge cases that pop up.

  1. What if you have "<% $variable %>"? You need to add a trailing semicolon when you print it.
  2. What if you have "% $x++ ?" You need to add a trailing semicolon.
  3. What if you want to do extra processing in a <% %> tag? Say something like: <% $text =~ /^(.{0,$length}\w)\b/s; $1 || $text > (yes, the example may be a bit contrived, but I had it for a while. I wanted to truncate some things but didn't have a good place to put the code)

Yeah, yeah, I could've just made demands on the code to require you to add semi-colons or only have the value you're going to output in the code, but I didn't like that. So I added in code to add semicolons and allow additional text before my return value.

The initial pass broke horribly because I forgot that there were cases where you didn't want to add a semicolon (such as %% foreach my $val (@array) {), so I wrote up a more advanced parser to deal with those cases.

I then ran it as is for a few months.

The first stumbling block I ran into was when I wanted to use a sprintf in my code. Whoops. The pre-processor was looking for any % anywhere on the line as the start of a code block, so it was goofing up royally. I hastily changed my default code delimiter to be '%%', knowing that it would break on that string (such as a literal % in a sprintf statement), but it was less likely to occur, so an acceptable risk. A few months later I further tweaked it so that %%s weren't an issue at all.

The next problem was that I ran into the (simplified) case <% ";" %> Well, my simple little preprocessor would assume that you wanted to execute the code "; and then output the code ". Obviously, Perl disagreed, so I needed to beef up my parser. Incidentally, that version of the parser is the one that's on the current release, but it only handles the cases of ";" and ';' and ignores q{;}, qq{;}, qw{;} and the like. It's fixed internally, but more on that later.

So it was running fine, but was pretty slow. I needed to speed it up. The big bottleneck was that pre-processing step to flip the quotes. But, I realized, the pre-processing should always be the same (unless the template has changed), so I can just cache it to disk. I implemented a caching scheme to store it on disk and then as the first step of pre-processing, look to see if a valid one already exists. If it does, great - use it. if not, then pre-process and re-cache. Come to think of it, this may have been when I implemented the deep nested packages approach, so I could have something easily written to disk that would always come out of the cache the same way, but this was years ago, so I'm foggy on the details.

I now was caching stuff and running along very fast. Awesome.

But it really sucked to have large code blocks in my templates. Since I'd chosen to use %% to start code and "\n" to end code, I had to prepend large codeblocks with strings of %%s. This sucked. So I added an additional set of tags - <code> and </code> to delimit large blocks of code.

Next I wanted to embed templates. By this point, I had large blocks of common code duplicated all over my website, and I wanted to get rid of that. So I needed to embed them.

I opted for <& &> for my embed syntax and quickly determined that I wanted two different approaches - sometimes it'd be keen to hand in additional variables to the subtemplate, and sometimes it'd be keen to just have it inside the calling template's environment and inherit everything currently in scope. I'm sure I had a really really good reason for it at the time, but I can't quite remember it now.

<& /path/to/other/template.tpl { 'arg1' => \'val1', 'arg2' => \'val2' } &> or <& /path/to/other/template.tpl &>

So that was created with different rules to allow for the two different approaches. It also unearthed a nuisance - passing in variables always had to be done by reference. It had always been there, but hidden down a few layers so I rarely encountered it. The second issue was the key value was a simple string without a sigil. So I couldn't pass $msg and @msg into the same namespace, it wouldn't allow it, I'd have to alias one.

This lead to me allowing simple scalar values to not be passed by reference and allowing an optioanl sigil on the variable key.

It also introduced the problem of needing to know exactly where these templates live. Previously, with just a single template, only its cgi needed to know where it was, and it was usually in the same directory. But now? Templates could talk to templates in other directories? How does it find them? Relative paths? Relative to what? Where does this template exist, anyway?

This lead to the addition of a template root to allow for absolute paths that weren't server dependent (paths from the server's root would've been horrible, since I needed it to work in different directory locations on my machine vs. my host). That way, I could always use absolute paths and have them work, but not worry about portability.

As my templates were getting more complex, around this time I also added in template comments <# comment #> to allow for better documentation. They're better than HTML comments because the processor would strip them out before display to the user.

As a further attempt to optimize, I added in an optional flag to compress whitespace, since it's not necessary in HTML anyway.

Since this was adding in a lot of additional stuff to the template, I added in a simple debugging syntax as well, above and beyond doing %% print STDERR...;, especially since it theoretically decoupled the debugger from STDERR w/o resorting to redirection of the handle. Note that I've never implemented anything like that in practice, it always just goes to STDERR.

It then ran and worked for a while.

I introduced another optimization and allowed embedded templates to have their pre-processed value stored in the pre-processed parent - <<%+ /path/to/template.tpl +&>. This skipped the step of firing up another template object, looking for a pre-cached copy, and returning and executing it. It just all ran inline in the parent as if it was always there. But, it lost the ability to look for changes in the subtemplate. So if you changed the subtemplate, the supertemplate wouldn't automatically reflect. You needed to blow away the cache or flat the supertemplate as changed. Naturally, this is an optimization to be used sparingly - either when speed really really really counts or when the embedded template rarely (if ever) changes.

But there was a lot of stuff that I ended up duplicating a lot. Such as HTML escapes or URL escapes. This was done by exposing the template object in the template to call methods on it. <% $self->html_escape($value) %>. But I didn't like having to put it in all the time, so I finally diverged from my "pure perl" approach by adding in pipes.

The pipe syntax is easy - just end off a variable with a pipe and a pre-defined token (extensible via subclasses) to internally translate into a method call. Instead, I'd now just type <% $value | h %>. Much simpler, but I never really liked the break from being true perl. I justified it to myself figuring that the | h was part of the template quoting not the perl, but never really convinced myself.

Recently (within the last week - not yet released)), I started needing a few additional things. First of all, I finally fixed the bug with the <% q{;} %> (ed- fixed typo) syntax. I spent about a day mulling over it and trying to think about how to fix it with a fancy parser or something, but wasn't convinced I could catch all cases. Instead, I opted to wrap up everything in those tags inside an anonymous subroutine, execute it, and display the results. This naturally takes advantage of perl's functions not requiring the "return" keyword.

update - I took advantage of Jenda's awesome suggestion down below and updated the internal build to use do {} blocks instead of anonymous subs. It loses the ability to type <% return $val %>, but I can live with it, it's a heck of an improvement otherwise.

I also had another problem - and that was that I had no way to generate snippets of text and pass them around in the template. I'd need to do that in the CGI layer with an extra method then hand in the pre-processed data to the template. But this didn't scale well (for example, loops in the template). Besides, I now had my output coming from two different places.

So this called for anoter divergence from perl syntax as I added in an embed redirect - <& /path/to/template.tpl >> $variable &> to instead stuff the output into $variable, which could then be handed around as desired.

This brings us up to current, where my "simple little template package" stands at 1,345 lines (counting documentation) and has been actively worked on for 4 years.

Things to take away from this project.

  • I actively work on and develop this thing because it's a hobby for me. I enjoy working on it, I have a fair bit of infrastructure invested in it. But, there has been some percentage of my time for the past 4 years that I've spent developing it. Anyone else that looks at it would need to be trained on how it works. Any bugs that pop up in it or new features I need are my responsibility to take care of.

    If you go off and build your own templating system, you could very well end up in the same situation. And it may not be cost effective for you. Joel Spolsky wrote in one of his columns that by all means you should build your own copy of a piece of software if it will give you a competitive advantage. If it doesn't give you an advantage, you shouldn't bother - use something off the shelf and spend your time on something that will give you a competitive advantage.

    Do I think this gave me a competitive advantage? Tough to say, since I have no competitors in the "Jim's homepage" arena that I'm aware of. But, it does work exactly the way I want and does exactly what I want. In the cases where it doesn't do what I want, I can usually change and update it. So maybe it's kept me happier and more interested in working on my webpage than I would've been otherwise.
  • If you're going to build your own templating system, treat it like production code, since it is. Further, treat it as if it's code that's going to be released and work on it the way you'd expect other open source things to be worked on. Don't treat it as your internal toy that can be modified and hacked as desired just to get one thing in - you've got to really invest time and effort into designing, re-factoring, and developing to make sure it's consistent and easy to work with. If you don't, it will quickly buckle under its own weight.
  • Even if you can justify building your own system by the productivity argument up above, remember that you are spending additional time and resources on a brand new internal project that won't directly make you money. Unless you get rid of it, the development will never go away. It may tone down or go on hiatus, but eventually it'll need work done on it. This will extend deadlines and keep you away from software that makes you money. It's a great hobby for my free time. I wouldn't have done this for work. Note that I would be comfortable using Basset::Template for work now, but I wouldn't have started developing it for a job.
  • If the guy that wrote your templating system leaves, you're in trouble. I don't care how easy it is (please, make it easy!), there will be more of a learning curve for new people than with a standard template. Plus, if the original developers leave, you may learn that it's difficult to support and extend the code, which increases your cost. Note that I didn't need to worry about this anyway, since I'm a one man shop on this "project".
  • I do consider this module a success. It's fast, easy to implement, easy to extend and enhance, and powerful in what it does. If I had to do it over again, I would've done it the same way. Your mileage may vary.
  • I'm not trying to advertise my own template. Use whatever you want. Other stuff on CPAN rocks. Use what you're comfortable with. Remember, there's always a cost involved in switching templating engines, so jump technologies sparingly.
  • I'm not trying to downplay someone from writing their own template, just trying to paint a picture of what you may be getting into if you try. I got lucky and my simple little template system caught my interest and has kept me working on it over the years, and I've been willing to devote the time to working on it. It could have easily gone the other way. Just be aware of what you're getting into.

In reply to The history of a templating engine by jimt

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others contemplating the Monastery: (15)
    As of 2014-07-31 18:15 GMT
    Find Nodes?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:

      Results (250 votes), past polls