Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re^4: Thanks to Ikegami, Chromatic & Corion

by ikegami (Pope)
on Nov 01, 2011 at 23:20 UTC ( #935248=note: print w/replies, xml ) Need Help??

in reply to Re^3: Thanks to Ikegami, Chromatic & Corion
in thread Thanks to Ikegami, Chromatic & Corion

"&" is not a meta character in aXML unless followed by "lab;" or one of the other 5, so "&" outputs "&", so "&lab;special&rab;" produces "&lab;special&rab;".

To output "&lab;special&rab;" one needs "<special>lab</special>special<special>rab</special>".

The escape function is:

my %escapes = ( '<' => '&lab;', '>' => '&rab;', '(' => '&lcb;', ')' => '&rcb;', '[' => '&lsb;', ']' => '&rsb;', '&lab;' => '<special>lab</special>', '&lcb;' => '<special>lcb</special>', '&lsb;' => '<special>lsb</special>', '&rab;' => '<special>rab</special>', '&rcb;' => '<special>rcb</special>', '&rsb;' => '<special>rsb</special>', ); #my $escapes_pat = join '', map quotemeta, keys %escapes; #my $escapes_re = qr/$escapes_pat/; my $escapes_re = qr/[<>()\[\]]|&[lr][acs]b;/; # Manually tweaked. sub escape(_) { my ($s) = @_; $s =~ s/($escapes_re)/$escapes{$1}/g; return $s; }

These are probably better choices:

my %escapes = qw( & &AMP; < &LAB; > &RAB; ( &LCB; ) &RCB; [ &LSB; ] &R +SB; ); sub escape(_) { my ($s) = @_; $s =~ s(/[&<>()\[\]])/$escapes{$1}/g; return $s; }

Or using v5.14's s///r:

my %escapes = qw( & &AMP; < &LAB; > &RAB; ( &LCB; ) &RCB; [ &LSB; ] &R +SB; ); sub escape(_) { $_[0] =~ s(/[&<>()\[\]])/$escapes{$1}/gr }

Why is "parenthesis" abbreviated to "c"? I think reading the "c" as curly, but "{" and "}" are the curly brackets.

Replies are listed 'Best First'.
Re^5: Thanks to Ikegami, Chromatic & Corion
by Logicus on Nov 02, 2011 at 02:12 UTC

    c as in "curved"

    Changing the specials from lower to uppercase would be quite easy, perhaps it would be better to support both? Or would that be confusing?

    I'm going to add that sub escape in the parser right now as I think it will be a lot faster than how I'm currently doing it:

    $aXML =~ s@&lab;@\<@gs; $aXML =~ s@&rab;@\>@gs; $aXML =~ s@&lcb;@\(@gs; $aXML =~ s@&rcb;@\)@gs; $aXML =~ s@&lsb;@\[@gs; $aXML =~ s@&rsb;@\]@gs;

    Oh, btw there is also another special and token which I haven't mentioned yet.

    pseudocode ---------- $aXML =~ s@`@<backtick>@gs; while ( commands remain unprocessed ) { foreach match for <any_command> { substitute for <`any_command> set found_command boolean flag true } if (found_command) { while ( find match for <`command>data</command> ) { process if ( no nested ` chars found ) } } }

    The reason for that is so that it forces the tags to be computed innermost to outermost, by negating the ` control char. Also the parser does not proceed to the lower priority tag types until all of the higher types have been processed.

    File inclusions restart the parser so that they can have a complete new tag hierarchy within them, which can then include another hierarchy and so on recursively.

    I suspect the backtick char (and tag) probably won't be needed at all by a proper compiler.

    I also have some ideas about an editor for aXML, as far as I can tell it should be possible to run tags in isolation right inside the editor to see what they output.

    So if we have a bit of aXML like


    And we give the editor a query in like an address bar at the top;


    Then right clicking foo could run the plugin and return the result right there.


    This would make debugging really easy and quick! If "bar" is not what your expecting to get from the tag then you know the problem is with the plugin. If "bar" is correct, then you can click on one tag outwards to execute that and see what it does... and so on interactively.

    It would beat the hell out of having to swap between browser and editor windows constantly to see what is going on, and say a double right click could restore it back to its original state.

    opening up an inc tag like that would load the appropriate file, ready for editing, then when you click back to close it again the editor can automatically save the secondary file for you. This way you can navigate and traverse complex structures and hierarchies without having to load, save and close files manually.

      I'm going to add that sub escape in the parser right now

      The point of escape is to prevent aXML from processing that which is passed to the function. It is used by plugins, not aXML. aXML performs the *reverse* operation after the template has been fully processed.

      my %escapes = ( '&lab;' => '<', '&rab;' => '>', '&lcb;' => '(', '&rcb;' => ')', '&lsb;' => '[', '&rsb;' => ']', ); sub final_processing { my ($content) = @_; $content =~ s{ (?: (&[lr][acs]b;) | <special...>(...)</special> | <post_include...>(...)</post_include> ) }{ if (defined($1)) { $escapes{$1} } elsif (defined($2)) { '&'.$2.';' } else { ... } }xeg; return $content; }


        I have this gut feeling, about which I am by no-means certain, that if you were to build a compiler based on how your currently thinking aXML should work, it would be very fast but would not render certain documents correctly. Specifically documents which contain higher level compound aXML statements.

        The only analogy I can think of right now is if I handed you a Lego Technic kit that you can build a transformer out of, composed of lots of lovely little pieces designed to fit together and to transform smoothly from their initial state to their final state, and you took it to pieces added a few constraints to several of the blocks, and then put it back together again, it would no longer transform correctly.

        Optimus Prime would be a cripple.

        Look I don't know... as I said before I've only begun to scratch the surface of how aXML units can inter-operate and transform themselves and each other from the initial state to the rendered state.

        All I know is that by using the quasi-xml style markup, the transformation is well structured and guided by the structure to perform the same transition smoothly time and time again.

        When I was working with the thing at the start, speed and efficiency were not my primary concerns. The flexibility and expressivity of the language were. I figured that I could solve the efficiency issues later. (which I have now to sufficient degree for my usages)

        Now, thing is right, just because I came up with the schema doesn't mean I know everything about it or that the way I use it is necessarily the ideal way of using it. I recognise that fact and welcome anyone wanting to come up with any sort of variant on it.

        All I can say is that I've found a pattern that works for me, and is the best way to use it that I have been able to conceive of so far.

        Basically in my PSGI file, all documents start out with the same identical seed :

        my $action = sub { return [200, [ 'Content-Type' => 'text/aXML' ], [ '(use)(qd)action(/qd)(/use)' ] ]; };

        This is the same as saying :

        my $action = sub { return [200, [ 'Content-Type' => 'text/aXML' ], [ process( "$doc_root/actions/$qd->{'action'}/body.aXML" ) ] ]; };

        The Plack::Middleware::aXML component takes that seed and from that point the parser iterates over the document as many times as needed transforming it one step at a time into the final result.

        Typically, at least in my code, the body.aXML that is loaded contains a call to include a template, either the public page template or the private page template (for the control panel pages for instance), which then has further processing directives in it. Some of the body.aXML files also contain calls to do some processing prior to loading the template, so that the template will render correctly after something has changed, for instance the user preference for CSS colour scheme.

        If plugins are not able to return valid markup for the parser to pickup on on it's next iteration, the "use" statement would be useless, and processing would halt at that point because the inserted template would not contain any directives the parser can recognise.

        Autobots! Transform! .... *CRUNCH*... er...

Re^5: Thanks to Ikegami, Chromatic & Corion
by Logicus on Nov 02, 2011 at 02:27 UTC

    Erm... something ain't right, I just hacked your escapes code in and it's converted every "<" and ">" in the whole document!

    output ------ &lab;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "ht +tp://"&rab; &lab;html lang="en"&rab; &lab;head&rab; &lab;link href="/css/main.css" rel="stylesheet" type="text/css"&ra +b; &lab;null&rab; &lab;link href="/css/colours/daytime.css" rel="stylesheet" type="t +ext/css"&rab; &lab;script type="text/javascript" src="/js/ajax.js"&rab;&lab;/scr +ipt&rab; &lab;meta http-equiv="Content-Type" content="text/html; charset=ut +f-8"&rab; &lab;title&rab;Perl Nights&lab;/title&rab; &lab;/head&rab; &lab;body&rab; ... ...

      The escapes need to be the other way around!

      my %escapes = ( '&lab;' => '<', '&rab;' => '>', '&lcb;' => '(', '&rcb;' => ')', '&lsb;' => '[', '&rsb;' => ']' );

      I need a new escapes_re, because now it's simply destroying all the brackets!

        I need a new escapes_re, because now it's simply destroying all the brackets!


        Got it!

        my %escapes = ( '&lab;' => '<', '&rab;' => '>', '&lcb;' => '(', '&rcb;' => ')', '&lsb;' => '[', '&rsb;' => ']' ); my $escapes_re = qr/&[lr][acs]b;/; $aXML =~ s/($escapes_re)/$escapes{$1}/g;

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://935248]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2018-04-23 17:54 GMT
Find Nodes?
    Voting Booth?