http://www.perlmonks.org?node_id=935485


in reply to Re^7: Thanks to Ikegami, Chromatic & Corion
in thread Thanks to Ikegami, Chromatic & Corion

The only analogy I can think of right now is if I handed you a Lego Technic kit that you can build a transformer out of, composed of lots of lovely little pieces designed to fit together and to transform smoothly from their initial state to their final state

The problem is that the little pieces don't fit together in what you handed me. There are some connectors missing. I'm slowly getting you to realise these connectors are missing.

would not render certain documents correctly.

Only if you misapply it.

Your current flow is:

plugin produces aXML code, parser takes that aXML code and produces HTML code, browser takes that HTML code and produces text.

Except your plugins don't always produce aXML code. For example, look at d. IIRC, you said it currently produces HTML. escape addresses that bug. It will convert the HTML into aXML.

text_to_aXML: The "escape" function I posted, renamed for clarity. text_to_html: Converts & to &amp, < to &lt, etc. # Current sub d { my $text = ...; return text_to_html($text); } # Should be sub d { my $text = ...; return text_to_aXML(text_to_html($text)); }

You seem to be considering what would happen if every plugin used it, and indeed that would make no sense.

Replies are listed 'Best First'.
Re^9: Thanks to Ikegami, Chromatic & Corion
by Logicus (Initiate) on Nov 02, 2011 at 20:56 UTC
    plugin produces aXML code, parser takes that aXML code and produces HTML code, browser takes that HTML code and produces text.

    You've misunderstood the flow.

    parser takes aXML code, parser discovers/executes valid aXML tags plugins produce either aXML or HTML or mix of both (or other) parser discovers/executes valid aXML tags plugins produce either aXML or HTML or mix of both (or other) .... parser can find no more valid aXML tags parser exits, post processing occurs browser takes finished resulting HTML and produces page

    Here, I just got finished playing around with a question someone posted about processing some poorly formed HTML given that they don't know (or want to know) anything about regexs.

    In aXML I would solve it like this :

    (aXMLplugin name="TD")"$_[0]",(/aXMLplugin) (aXMLplugin name="TR") <db_write>INSERT INTO captured_rows VALUES (<chop><strip_tws>$_[0]</strip_tws></chop>); </db_write> (/aXMLplugin) (inc)path/to/htmldata(/inc)

    The above primes two new plugins called TR and TD just for this job, then includes the html file specified and processes it adding any information it can correctly identify into a database as it goes.

    The states it goes through during processing are:

    State1 ------ (aXMLplugin name="TD")"$_[0]",(/aXMLplugin) (aXMLplugin name="TR") <db_write>INSERT INTO captured_rows VALUES (<chop><strip_tws>$_[0]</strip_tws></chop>); </db_write> (/aXMLplugin) (inc)path/to/htmldata(/inc) State2 ------ (aXMLplugin name="TR") <db_write>INSERT INTO captured_rows VALUES (<chop><strip_tws>$_[0]</strip_tws></chop>); </db_write> (/aXMLplugin) (inc)path/to/htmldata(/inc) State3 ------ (inc)path/to/htmldata(/inc) State4 ------ <TR> <TD> Channel </TD> <TD> Call Letters </TD> <TD> Count </TD> <TD> Percent </TD> <TD> Title </TD> </TR> ... ... State5 ------ <TR> " Channel ", " Call Letters ", " Count ", " Percent ", " Title ", </TR> ... ... State6 ------ <db_write> INSERT INTO captured_rows VALUES (<chop><strip_tws> " Channel ", " Call Letters ", " Percent ", " Title ", </strip_tws></chop>);</db_write> ... ... State7 ------ <db_write> INSERT INTO captured_rows VALUES (<chop> " Channel ", " Call Letters ", " Percent ", " Title ",</chop>);</db_write> ... ... State8 ------ <db_write> INSERT INTO captured_rows VALUES ( " Channel ", " Call Letters ", " Percent ", " Title ");</db_write> ... ...

    As I'm sure you can see there are a couple of plugins there that are not in the standard set I sent to you earlier. I think it should be quite obvious to you how they would work and how you would go about adding them to the set.

    Except maybe the "(aXMLplugin)" tag, which needs a little more explanation... the data it contains is returned interpolated. So:

    (aXMLplugin name="greet")hello world(/aXMLplugin)

    Would look like this in the plugin code it produces :

    greet => sub {"hello world"},

    The fact it's a one liner to be interpolated is taken as being implicit in the way I'm thinking about this plugin at the moment, such that :

    (aXMLplugin name="foo")"hello world"(/aXMLplugin)

    Would look like this in the code level :

    foo => sub {"\"hello world\""},

    And give this for output :

    "hello world"

      Sigh. For every step forward, there's a step backwards.

      That's not true. The output of the plugin is ALWAYS take to be aXML. If it was any other way, aXML would look like the following (when outputting HTML):

      plugin produces aXML or HTML code, if the plugin returned aXML, the parser takes that aXML code and produces HTML code, browser takes that HTML code and produces text. if the plugin returned HTML, browser takes that HTML code and produces text.

      There is no such check, because such a check is impossible. So, again, what actually happens is

      plugin produces (what is taken to be) aXML code, parser takes that aXML code and produces HTML code, browser takes that HTML code and produces text.

      Plugins that don't produce aXML are buggy.

        Nope, your looking at it the wrong way, probably my fault for comparing it with TT2 lol.

        Try this analogy, it's like self-golfing code.

        Let's take an example from that file I sent you.

        (db_mask) <query> SELECT * FROM threads WHERE threadid="(sqd)threadid(/sqd)" < +/query> <mask> <h2>[hlink action="show_section" sectionid="<d>sectionid</d>" ][db_get]sections.sectionid="<d>sectionid</d>".display_name[/db_g +et][/hlink] > <d>title</d></h2><hr/> </mask> (/db_mask)
        mumble mumble... make text box much wider on PerlNights.. mumble.. mumble

        The state of the document changes as it is being parsed, first the parser runs a fast regex to look for primary ( ) tags, and marks them with the ` control char

        (`db_mask) <query> SELECT * FROM threads WHERE threadid="(`sqd)threadid(/sqd)" +</query> <mask> <h2>[hlink action="show_section" sectionid="<d>sectionid</d>" ][db_get]sections.sectionid="<d>sectionid</d>".display_name[/db_g +et][/hlink] > <d>title</d></h2><hr/> </mask> (/db_mask)

        It's found 2 primary tags to compute, one is nested inside the other. Now because it has found at least one primary tag, a slower regex runs which finds the whole tag and it's close, and then treats it like a subroutine call, returning the return value to the document.

        The slower regex is looped and it negates the ` char within the tag structure, so that only tags which have no nested commands get processed each time we go around the loop.

        The result is that the first tag to be computed is the (sqd) tag, followed immediately by the (db_mask) tag. So let's say that the (sqd) tag returned a value of 1. The db_mask tag then gets picked up and run with the following data :

        in $_[0] we have ---------------- <query> SELECT * FROM threads WHERE threadid="1"</query> <mask> <h2>[hlink action="show_section" sectionid="<d>sectionid</d>" ][db_get]sections.sectionid="<d>sectionid</d>".display_name[/db_g +et][/hlink] > <d>title</d></h2><hr/> </mask>

        The query and mask tags are extracted, the query gets run and returns hashrefs which are then used to populate the mask. The db_mask plugin will return a copy of the mask interpolated with the specified columns for each row the query produces.

        Since in this case the query will only return a single row, the return value looks like this :

        <h2>[hlink action="show_section" sectionid="1" ][db_get]sections.sectionid="1".display_name[/db_get][/hlink] >Th +e first thread</h2><hr/>

        The parser now looks for any remaining ( ) primary tags, determines there are none left, and moves on to the secondary tags < >. Since there are none of these either (h2 is not a defined aXML tag), it skips on the the tertiary tags, [ ]

        <h2>[`hlink action="show_section" sectionid="1" ][`db_get]sections.sectionid="1".display_name[/db_get][/hlink] >T +he first thread</h2><hr/>

        Once again it has found 2 tags that it recognises with the fast scanner, and invokes the slower scanner which loops negating the control char, causing the tags to be run db_get first, followed by hlink.

        db_get runs first ----------------- <h2>[`hlink action="show_section" sectionid="1" ]The Castle Gates[/hlink] >The first thread</h2><hr/> Followed by hlink ----------------- <h2><a href="action.pl?action=show_section&sectionid=1">The Castle G +ates</a> >The first thread</h2><hr/>

        The parser then concludes there is nothing left to do and exits to the post processing stage.