Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^4: aXML vs TT2

by Logicus
on Oct 21, 2011 at 19:30 UTC ( #932970=note: print w/replies, xml ) Need Help??


in reply to Re^3: aXML vs TT2
in thread aXML vs TT2

...It requires that I know every element name that can be found in an XML document...

Ok then, we add a new sub called something like <no_parse> which takes a path to the XML to be included in the output. The file is loaded and stored in memory and inserted after the parser has exited. That way it doesn't matter if the file to be included contains tags which match the plugins because it wont be seen by the parser. The original version has something similar to that called <ignore>, but I haven't got round to creating that functionality in the new one yet.

Oh and regarding shortend tags like <inc name="header"/> I like that, however supporting those sort of tags as well as the standard ones will require extra compute time under the current methodology. It's perfectly possible to do, but I'm not sure the gain would be worth the overhead.

That's not to say that the current methodology is the be all and end all, Corion once suggested writing a compiler for aXML which would solve that problem and further improve overall performance, however such a solution is currently over my head and given how blazing fast the current version is I just can't feel the desire to try and implement it (again).

Replies are listed 'Best First'.
Re^5: aXML vs TT2
by ikegami (Pope) on Oct 21, 2011 at 21:13 UTC

    Ok then, we add a new sub called something like <no_parse>

    That wouldn't work.

    Oh and regarding shortend tags like <inc name="header"/>

    It wasn't shortened. It's actually three characters longer. It just seems clearer to me for inc

    however supporting those sort of tags as well as the standard ones

    I wasn't suggesting that you handle both <inc>header</inc> and <inc name="header"/>, just one of them. Supporting both would be confusing.

      >That wouldn't work

      But it does work, I've used this technique before and will be implementing it on the new parser when I get round to it.

      Perhaps you had some other case in mind which is different to what I'm thinking?

      About the tags, your right giving both standards would be confusing (and processor costly) so I'm only going to be supporting the regular tag types on my version. If anyone using it wants to change it to do both that's entirely upto them.

        It doesn't work because it can't be used in practice. It renders all "d" inert, not just the ones that aren't aXml directives. It can't be used with your very own example, for one. db_select and d doesn't work in the following:

        <inc title="User Info">header</inc> <no_parse> <table border=0 width="100%"> <tr> <th>User ID</th> <th>Name</th> <th>Email</th> </tr> <db_select> <query> SELECT * FROM user ORDER BY id </query> <mask> <tr> <td><d>id</d></td> <td>[hlink action="user_profile" user_id="<d>id</d>"]<d>name</d> +[/hlink]</td> <td><d>email</d></td> </tr> </mask> </db_select> </table> </no_parse> <inc>footer</inc>

        About the tags, your right giving both standards would be confusing (and processor costly) so I'm only going to be supporting the regular tag types on my version. If anyone using it wants to change it to do both that's entirely upto them.

        What are you talking about? I'm talking about using an attribute instead of a text node for inc, but you keep using plural... Wait, are you saying that aXml requires you to write

        <foo bar="moo"></foo>
        instead of
        <foo bar="moo"/>

        wtf! And because you think it would be less costly to process a whole additional tag? double wtf!

Re^5: aXML vs TT2
by anneli (Pilgrim) on Oct 22, 2011 at 02:43 UTC

    So what does your system do right now when it sees a tag like <tag />? Does it get ignored?

      The parser basically works in a two phase process, first it scans using a very fast non-backtracking regex, to see if it can find any opening tags it recognises. If it succeeds it marks the tag with a control character, then runs a looped slower regex which scans for a complete open and close tag set. When it has something it knows is valid, and does not contain a nested tag set (as determined by negating the control character), it then executes the relavent code and substitutes the return value into the document. The loop continues until there are no more matching sets to process.

      The first phase negates the backslash to prevent it from picking up on and marking out the close tags so the tag you mentioned will probably be ignored by both phases and remain untouched in the final output. I'd have to check back on the actual regex used to be certain but I'm pretty sure that is correct.

        Have you considered writing a proper state machine-based lexer/parser? It would positively fly, compared to using regular expressions, and probably end up more amenable to extension (if you wanted to actually support XML, for instance).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://932970]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (1)
As of 2021-12-02 06:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (17 votes). Check out past polls.

    Notices?