http://www.perlmonks.org?node_id=932967


in reply to Re^2: aXML vs TT2
in thread aXML vs TT2

It's not the worst solution, but it's inherently fragile.

It requires that I know every element name that can be found in an XML document. Some formats are extensible (e.g. XHTML), so there is no such list for them. And even if we assume that list can be found, most people won't bother trying to create it.

It requires that I cross check that list against the list of keywords in a hash. How is that possible to do reliably if the list if as dynamic as you say.

This will lead to errors that can be very subtle. And given how templates are typically used, the errors will not be seen by the dev and they will be seen by end users.

The thing is, XML has already solved that problem. The mechanism is called "namespaces".

<root xmlns:a="http://axml.com/1.0/"> <a:inc title="User Info">header</a:inc> <table border=0 width="100%"> <tr> <th>User ID</th> <th>Name</th> <th>Email</th> </tr> <a:db_select> <a:query> SELECT * FROM user ORDER BY id </a:query> <a:mask> <tr> <td><a:d>id</a:d></td> <td>[hlink action="user_profile" user_id="<a:d>id</a:d>"]<a:d>na +me</a:d>[/hlink]</td> <td><a:d>email</a:d></td> </tr> </a:mask> </a:db_select> </table> <a:inc>footer</a:inc> </root>

PS — I personally prefer <inc name="header"/> over <inc>header</inc>.

Replies are listed 'Best First'.
Re^4: aXML vs TT2
by Logicus (Initiate) on Oct 21, 2011 at 19:30 UTC

    ...It requires that I know every element name that can be found in an XML document...

    Ok then, we add a new sub called something like <no_parse> which takes a path to the XML to be included in the output. The file is loaded and stored in memory and inserted after the parser has exited. That way it doesn't matter if the file to be included contains tags which match the plugins because it wont be seen by the parser. The original version has something similar to that called <ignore>, but I haven't got round to creating that functionality in the new one yet.

    Oh and regarding shortend tags like <inc name="header"/> I like that, however supporting those sort of tags as well as the standard ones will require extra compute time under the current methodology. It's perfectly possible to do, but I'm not sure the gain would be worth the overhead.

    That's not to say that the current methodology is the be all and end all, Corion once suggested writing a compiler for aXML which would solve that problem and further improve overall performance, however such a solution is currently over my head and given how blazing fast the current version is I just can't feel the desire to try and implement it (again).

      Ok then, we add a new sub called something like <no_parse>

      That wouldn't work.

      Oh and regarding shortend tags like <inc name="header"/>

      It wasn't shortened. It's actually three characters longer. It just seems clearer to me for inc

      however supporting those sort of tags as well as the standard ones

      I wasn't suggesting that you handle both <inc>header</inc> and <inc name="header"/>, just one of them. Supporting both would be confusing.

        >That wouldn't work

        But it does work, I've used this technique before and will be implementing it on the new parser when I get round to it.

        Perhaps you had some other case in mind which is different to what I'm thinking?

        About the tags, your right giving both standards would be confusing (and processor costly) so I'm only going to be supporting the regular tag types on my version. If anyone using it wants to change it to do both that's entirely upto them.

      So what does your system do right now when it sees a tag like <tag />? Does it get ignored?

        The parser basically works in a two phase process, first it scans using a very fast non-backtracking regex, to see if it can find any opening tags it recognises. If it succeeds it marks the tag with a control character, then runs a looped slower regex which scans for a complete open and close tag set. When it has something it knows is valid, and does not contain a nested tag set (as determined by negating the control character), it then executes the relavent code and substitutes the return value into the document. The loop continues until there are no more matching sets to process.

        The first phase negates the backslash to prevent it from picking up on and marking out the close tags so the tag you mentioned will probably be ignored by both phases and remain untouched in the final output. I'd have to check back on the actual regex used to be certain but I'm pretty sure that is correct.