Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^9: Thanks to Ikegami, Chromatic & Corion

by Logicus
on Nov 02, 2011 at 20:56 UTC ( #935489=note: print w/ replies, xml ) Need Help??


in reply to Re^8: Thanks to Ikegami, Chromatic & Corion
in thread Thanks to Ikegami, Chromatic & Corion

plugin produces aXML code, parser takes that aXML code and produces HTML code, browser takes that HTML code and produces text.

You've misunderstood the flow.

parser takes aXML code, parser discovers/executes valid aXML tags plugins produce either aXML or HTML or mix of both (or other) parser discovers/executes valid aXML tags plugins produce either aXML or HTML or mix of both (or other) .... parser can find no more valid aXML tags parser exits, post processing occurs browser takes finished resulting HTML and produces page

Here, I just got finished playing around with a question someone posted about processing some poorly formed HTML given that they don't know (or want to know) anything about regexs.

In aXML I would solve it like this :

(aXMLplugin name="TD")"$_[0]",(/aXMLplugin) (aXMLplugin name="TR") <db_write>INSERT INTO captured_rows VALUES (<chop><strip_tws>$_[0]</strip_tws></chop>); </db_write> (/aXMLplugin) (inc)path/to/htmldata(/inc)

The above primes two new plugins called TR and TD just for this job, then includes the html file specified and processes it adding any information it can correctly identify into a database as it goes.

The states it goes through during processing are:

State1 ------ (aXMLplugin name="TD")"$_[0]",(/aXMLplugin) (aXMLplugin name="TR") <db_write>INSERT INTO captured_rows VALUES (<chop><strip_tws>$_[0]</strip_tws></chop>); </db_write> (/aXMLplugin) (inc)path/to/htmldata(/inc) State2 ------ (aXMLplugin name="TR") <db_write>INSERT INTO captured_rows VALUES (<chop><strip_tws>$_[0]</strip_tws></chop>); </db_write> (/aXMLplugin) (inc)path/to/htmldata(/inc) State3 ------ (inc)path/to/htmldata(/inc) State4 ------ <TR> <TD> Channel </TD> <TD> Call Letters </TD> <TD> Count </TD> <TD> Percent </TD> <TD> Title </TD> </TR> ... ... State5 ------ <TR> " Channel ", " Call Letters ", " Count ", " Percent ", " Title ", </TR> ... ... State6 ------ <db_write> INSERT INTO captured_rows VALUES (<chop><strip_tws> " Channel ", " Call Letters ", " Percent ", " Title ", </strip_tws></chop>);</db_write> ... ... State7 ------ <db_write> INSERT INTO captured_rows VALUES (<chop> " Channel ", " Call Letters ", " Percent ", " Title ",</chop>);</db_write> ... ... State8 ------ <db_write> INSERT INTO captured_rows VALUES ( " Channel ", " Call Letters ", " Percent ", " Title ");</db_write> ... ...

As I'm sure you can see there are a couple of plugins there that are not in the standard set I sent to you earlier. I think it should be quite obvious to you how they would work and how you would go about adding them to the set.

Except maybe the "(aXMLplugin)" tag, which needs a little more explanation... the data it contains is returned interpolated. So:

(aXMLplugin name="greet")hello world(/aXMLplugin)

Would look like this in the plugin code it produces :

greet => sub {"hello world"},

The fact it's a one liner to be interpolated is taken as being implicit in the way I'm thinking about this plugin at the moment, such that :

(aXMLplugin name="foo")"hello world"(/aXMLplugin)

Would look like this in the code level :

foo => sub {"\"hello world\""},

And give this for output :

"hello world"


Comment on Re^9: Thanks to Ikegami, Chromatic & Corion
Select or Download Code
Re^10: Thanks to Ikegami, Chromatic & Corion
by ikegami (Pope) on Nov 02, 2011 at 23:58 UTC

    Sigh. For every step forward, there's a step backwards.

    That's not true. The output of the plugin is ALWAYS take to be aXML. If it was any other way, aXML would look like the following (when outputting HTML):

    plugin produces aXML or HTML code, if the plugin returned aXML, the parser takes that aXML code and produces HTML code, browser takes that HTML code and produces text. if the plugin returned HTML, browser takes that HTML code and produces text.

    There is no such check, because such a check is impossible. So, again, what actually happens is

    plugin produces (what is taken to be) aXML code, parser takes that aXML code and produces HTML code, browser takes that HTML code and produces text.

    Plugins that don't produce aXML are buggy.

      Nope, your looking at it the wrong way, probably my fault for comparing it with TT2 lol.

      Try this analogy, it's like self-golfing code.

      Let's take an example from that file I sent you.

      (db_mask) <query> SELECT * FROM threads WHERE threadid="(sqd)threadid(/sqd)" < +/query> <mask> <h2>[hlink action="show_section" sectionid="<d>sectionid</d>" ][db_get]sections.sectionid="<d>sectionid</d>".display_name[/db_g +et][/hlink] > <d>title</d></h2><hr/> </mask> (/db_mask)
      mumble mumble... make text box much wider on PerlNights.. mumble.. mumble

      The state of the document changes as it is being parsed, first the parser runs a fast regex to look for primary ( ) tags, and marks them with the ` control char

      (`db_mask) <query> SELECT * FROM threads WHERE threadid="(`sqd)threadid(/sqd)" +</query> <mask> <h2>[hlink action="show_section" sectionid="<d>sectionid</d>" ][db_get]sections.sectionid="<d>sectionid</d>".display_name[/db_g +et][/hlink] > <d>title</d></h2><hr/> </mask> (/db_mask)

      It's found 2 primary tags to compute, one is nested inside the other. Now because it has found at least one primary tag, a slower regex runs which finds the whole tag and it's close, and then treats it like a subroutine call, returning the return value to the document.

      The slower regex is looped and it negates the ` char within the tag structure, so that only tags which have no nested commands get processed each time we go around the loop.

      The result is that the first tag to be computed is the (sqd) tag, followed immediately by the (db_mask) tag. So let's say that the (sqd) tag returned a value of 1. The db_mask tag then gets picked up and run with the following data :

      in $_[0] we have ---------------- <query> SELECT * FROM threads WHERE threadid="1"</query> <mask> <h2>[hlink action="show_section" sectionid="<d>sectionid</d>" ][db_get]sections.sectionid="<d>sectionid</d>".display_name[/db_g +et][/hlink] > <d>title</d></h2><hr/> </mask>

      The query and mask tags are extracted, the query gets run and returns hashrefs which are then used to populate the mask. The db_mask plugin will return a copy of the mask interpolated with the specified columns for each row the query produces.

      Since in this case the query will only return a single row, the return value looks like this :

      <h2>[hlink action="show_section" sectionid="1" ][db_get]sections.sectionid="1".display_name[/db_get][/hlink] >Th +e first thread</h2><hr/>

      The parser now looks for any remaining ( ) primary tags, determines there are none left, and moves on to the secondary tags < >. Since there are none of these either (h2 is not a defined aXML tag), it skips on the the tertiary tags, [ ]

      <h2>[`hlink action="show_section" sectionid="1" ][`db_get]sections.sectionid="1".display_name[/db_get][/hlink] >T +he first thread</h2><hr/>

      Once again it has found 2 tags that it recognises with the fast scanner, and invokes the slower scanner which loops negating the control char, causing the tags to be run db_get first, followed by hlink.

      db_get runs first ----------------- <h2>[`hlink action="show_section" sectionid="1" ]The Castle Gates[/hlink] >The first thread</h2><hr/> Followed by hlink ----------------- <h2><a href="action.pl?action=show_section&sectionid=1">The Castle G +ates</a> >The first thread</h2><hr/>

      The parser then concludes there is nothing left to do and exits to the post processing stage.

        Yes, I know, I heard you the first few million times.

        If you are saying that what the plugin returns is searched for aXML tags then you actually agree with me; what a plugin returns is treated as aXML. Not HTML.

        All plugins that can return arbitrary HTML are buggy and the others are potentially buggy.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://935489]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (9)
As of 2014-08-01 08:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (257 votes), past polls