Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^29: aXML vs TT2

by Corion (Patriarch)
on Oct 23, 2011 at 10:33 UTC ( [id://933187]=note: print w/replies, xml ) Need Help??


in reply to Re^28: aXML vs TT2
in thread aXML vs TT2

So it is far from trivial to safely output text from a plugin that looks like aXML but is not interpreted as aXML again.

I would reconsider this design choice because you might end up with problems ranging from annoying problems like not being able to use aXML to generate XML and aXML, to really big security holes like user generated content triggering internal aXML code.

Of course, proper documentation on how to generate various output would also help, and also the replacement rules when aXML turns &amp; into &, &lt; into < and when it passes them through.

Replies are listed 'Best First'.
Re^30: aXML vs TT2
by Logicus (Initiate) on Oct 23, 2011 at 11:11 UTC
    So it is far from trivial...

    Whatever is output from a tag goes back into the document and is treated exactly the same as everything else. The tags input aXML and output aXML, there is no distinction and it's upto the designer how to deal with that. I've found through experience a set of abstractions that exist in harmony with each other and I'll be giving those as a suggested default set because you can build just about anything with them.

    When it comes to writing your own plugins, a sound understanding of how the plugins interact and how you intend to use the new plugin and how it relates to other plugins will be needed.

    Having said that, it really isn't as hard as it may seem because the rules are very simple. Any programmer worth their salt will be able to master it in a matter of hours or maybe even minutes in some cases.



    I would reconsider this design choice...

    I wouldn't it works great exactly as it is.



    not being able to use aXML to generate XML and aXML

    There is no problem with that at all since you have complete control over the plugins their names and definitions.



    security holes like user generated content triggering internal aXML code.

    I went into this before somewhere. There are basically only two vectors; the query data and the cookies. Putting the cookies aside for a moment as there is little scope for insertion there, the solution I found to this problem was to sanitize the query data prior to starting the document.

    What I'm going to do is add a key to the conf hash in the Conf.pm file, that is an array of allowed tags. If anything else exists in the query data it will be converted so that it cannot be recognised by the parser and thus passes through without the possibility of effecting anything else. I'm making the list an allowed list instead of a disallowed list so if the list is empty it will not allow any tags through at all.



    proper documentation

    Gotta finish writing the thing first!! The new version is almost ready just got a few rough edges to sort out, some more plugins to port from the old version and then it should be just about ready to rock.



    & < etc

    In the vast majority of cases an aXML programmer won't actually need to worry about that stuff at all, they are rare special case features which "close the circle" so to speak. Understanding them will only be required if you have an application which can't live without them, for instance feeding very specific XML data to another system which will throw exceptions unless special chars are encoded right.

      not being able to use aXML to generate XML and aXML
      There is no problem with that at all since you have complete control over the plugins their names and definitions.

      Ugh. So if I discover that I have to output (or worse, pass through and modify) XML that contains <d> elements, I have to review all my aXML code to make sure that I don't use any plugin named/responding to <d>? I'm still not convinced that this is a sound design choice. That you don't immediately spot this problem makes me think that you haven't encountered the problem ,but that is to be expected if all you are doing is generate web pages and work alone. That way, you have a good overview of the admissible names. This mechanism will horribly fail once one of these two parameters changes.

        So if I discover...

        Sort of ish, maybe but no, there is various solutions.

        Lets make the example problem a little more concrete, then solve it.

        Let us say that for the URL http://acmesite.com/?action=get_foo you need to load, modify and output an XML file called foo.xml

        listing of foo.xml ------------------ <foos> <foo> <a>I pity the foo</a> <b>foo bar</b> <c>foo cough</c> <d>baz bar foo</d> </foo> <foo> <a>... <b>... ... </foo> ... ... </foos>

        And just to make the problem harder still let's also say that you wish to set all the values in any instance of <b> and <d> to upper-case.

        Now as it stands with the current plugin set, none of the tags contained in foo actually correspond to any known plugin/symbol name, so if you didn't need to edit the file you would simply include it using <insertfile> (note, the plugins <inc> and <use> refer to aXML files only since they append the string '.aXML' to the path given)

        (insertfile)foo.xml(/insertfile)

        "<d>" in previous examples was metadata only and I will come back to that in a minute.

        There are two ways I can think of to do this..

        Method 1

        Since we want to edit the values contained in "b" and "d" and since these two tags are not currently defined, we simply define them in the local private subs module that is specific to this action. (you can define them in the site global subs, or in the engine global subs if you want the plugin code to be accessible in a broader scope)

        Using this method we are going to need to use the specials. I've not done huge amounts with these specials before, because I haven't used aXML much at all in the way you guys want to use it, and I've had the luxury of being able to design all the data which is put into the system. These examples you have given have made me realise there is actually a call for 6 specials not 4 as I had previously thought, but that's not a problem and I have just added a couple of extra lines to the system so it now supports all 6 specials with this new standardised schema:

        In the doc In the output Name
        &lab; < "left angle bracket"
        &rab; > "right angle bracket"
        &lcb; ( "left curved bracket"
        &rcb; ) "right curved bracket"
        &lsb; < "left square bracket"
        &rsb; > "right square bracket"
        in your perl subs module for this action ---------------------------------------- $plugins = { b => sub { my $data = uc($_[1]); "&lab;b&rab;$data&lab;/b&rab;" }, d => sub { my $data = uc($_[1]); "&lab;d&rab;$data&lab;/d&rab;" } };

        That is all, now when you view the file, you will find that the contents of <b> and <d> have been uppercased and the remainder of the document has been untouched.

        It would also be possible to do what ikegami suggested earlier and define a plugin which outputs the specials so if I added such a plugin to the standard set, with code like this;

        lab => sub { "&lab;$_[1]&rab;" }

        your plugin could look like this;

        $plugins = { b => sub { my $data = uc($_[1]); "<lab>b</lab>$data<lab>/b</lab>" }, d => sub { my $data = uc($_[1]); "<lab>d</lab>$data<lab>/d</lab>" } };

        The result is slightly longer code, but it's a lot more readable.

        Method 2

        This is a simpler method, and this is probably the method I would of normally used prior to the extra thinking about the issue with the specials that I have just done.

        It starts off the same:

        (insertfile)foo.xml(/insertfile)

        Now we are going to define a new plugin to do the transform of the <d> and <b> and wrap it around the insertfile statement so that the output of the insert file falls within it's scope:

        <do_transform> (insertfile)foo.xml(/insertfile) </do_transform>

        The code for the <do_transform> sub is once again defined in the private subs file for this action

        $plugins = { do_transform => sub { my $data = $_[1]; #grab all the data $data =~ s@<b>(.*?)</b>@ do { my $uc = uc($1); "<b>$uc</b>" +}@e; $data =~ s@<d>(.*?)</d>@ do { my $uc = uc($1); "<d>$uc</d>" +}@e; return $data; } };

        Now lets get back to the issue with <d>, which as I mentioned is not actually a sub but just a bit of metadata that <db_select> uses.

        The problem here is that if we wanted to generate foos.xml from a database, the <d> tag in foos.xml would conflict with the <d> tag used by the <db_select> tag. Recall that outside of the scope of a <db_select> <d> has no meaning and therefore is not a problem.

        <foos> <db_select> <query> SELECT * FROM foo</query> <mask> <foo> <a><d>a</d></a> <b><d>b</d></b> <c><d>c</d></c> <d><d>d</d></d> // oh dear... ... ...

        We can solve this in a couple of ways.

        Solution 1

        You could overload the <db_select> tag with one of your own design which works with your input data, for instance you could modify it to use <col> as the column delimiter instead of <d>.

        Solution 2

        I could modify db_select to take an extra argument to determine the column delimiter (I think I will infact be making this change.)

        <foos> <db_select delimiter="col"> <query> SELECT * FROM foo</query> <mask> <foo> <a><col>a</col></a> <b><col>b</col></b> <c><col>c</col></c> <d><col>d</col></d> // that's better :) ... ...

        Now lets say you don't know ahead of time what tagnames are going to be in the XML, and you want to make absolutely sure your not going to get a conflict.

        What we could do is define a plugin which deletes all the plugins, and adds in a couple of new ones which you design specifically for this action.

        $plugins = { delete_all => sub { $plugins = { load => sub {... }, transform => sub {... } }; } };

        Then in your document you call <delete_all> then call on the new subs to do what you need to do (I.E, load and modify the file).

        (delete_all)(/delete_all) <transform> <load>foo.xml</load> </transform>

        Or something like that, use your imagination your a programmer.

        P.s, if their is any stupid mistakes in the above I'm sorry but I've been looking at the screen for about 12 hours now and I'm about to go hit the sack.

      for instance feeding very specific XML data to another system which will throw exceptions unless special chars are encoded right.

      Also known as valid xml

      XML really shouldn't be part of the name for this thing

        I would like to find a better name for it. aXML is just what I called it way back when, prior to finding out that there is a closed source program of the same name.

        Any suggestions? I was toying with calling it Logicus to be like "calculus", but I'm open to any ideas on the subject.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://933187]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-18 04:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found