http://www.perlmonks.org?node_id=917828


in reply to rough start of an axml compiler

You say in your "offtopic epiphony" that you believe
<<a>b</a>>c</<a>b</a>>
to be unrepresentable in any kind of data structure, perl or otherwise. Here's a simple solution:
my @nodes = ( bless( { 'data' => 'c', 'tag' => bless( { 'data' => 'b', 'tag' => 'a' }, 'Node' ) }, 'Node' ) );
The definition of the action to be performed on data 'c' is postponed until operation 'a' is performed on data 'b'.

I think you wrote a parser already, so I'm sure you can adapt it to produce a structure like above. Once you have the structure, generating the output is also straightforward:

package Node; use Moose; has [ qw<data tag> ] => ( is => 'rw', isa => 'Any' ); sub as_text { my ($self) = shift; my $tag = $self->tag; my $tag_processing_method = ref $tag ? $tag->as_text : $tag; return $self->$tag_processing_method( $self->data ); } # Tag Processing Methods here: sub a { "super_$_[1]" } # prepend "super_" sub b { "b_$_[1]" } # prepend "b_" sub super_b { "B_$_[1]" } # prepend "B_"
If you run it (say for map { $_->as_text } @nodes), you'll see that instead of sub b being called, super_b is called.

I'd be very inclined to add string overloading to the Node package so:

use overload q{""} => 'as_text', fallback => 1, ;
which could make the calls even simpler, (with a possible cost to debugging and maintainability). as_text becomes
sub as_text { my ($self) = shift; my $processing_method = $self->tag; return $self->$processing_method( $self->data ); }
and generating output once you have your @nodes array is simply stringification. print @nodes;

update fixed some typos

Replies are listed 'Best First'.
Re^2: rough start of an axml compiler
by Logicus (Initiate) on Aug 01, 2011 at 19:29 UTC

    I said any kind I know of, but then I am renowned for being an uneducated thick-wit who won't listen to advice of my elders and betters.

    I'm going to have to have a good think about what you've put there above. Digestion should be complete in a few days, before which any comment I make will probably be seen as another example of my stupidity.

    The first thing that is running through the vacuous hole I refer to sometimes laughingly as my brain, is how to decompress this :

    my @nodes = ( bless( { 'data' => 'c', 'tag' => bless( { 'data' => 'b', 'tag' => 'a' }, 'Node' ) }, 'Node' ) );

    From the source;

    <<a>b</a>c</<a>b</a>>

    I have a pathological aversion to all things OOP, but the apparent simplicity of what you have shown above is strangely appealing. Thanks!

Re^2: rough start of an axml compiler
by Logicus (Initiate) on Aug 02, 2011 at 12:40 UTC

    Well Boldra, you've thrown a proper little spanner into my works... I'm not complaining because I really like your example!

    I was going to run a small number of regex conversions on an aXML string and turn it into classic XML to feed XML::Simple for turning into a perl structure, but I can't do that now if I want to use the method above. .o0(~Hrm~)

    One quick question though, under this schema would every tag have to have a definition? As in what would happen to tags which are just markup around and within tags which have defined roles?

    Also there is another thought that I don't know exactly how to describe I guess you could call it orphan data, for example:

    listing actions/default/body.aXML --------------------------------- <html> <head><title>acme products</title></head> <body> some orphan text that needs to be in the output <use>actions/<qd>action</qd>/main.aXML</use> some more orphan text </body> </html>

    I'm guessing that the above would be mapped to your moose solution thusly:

    package actions::default::body; my @nodes = ( bless ( { 'tag' => 'html', 'data' => [ bless ( { 'tag' => 'head', 'data' => bless ( { 'tag' => 'title', 'data' => 'acme products' }, 'Node' ), bless ( { 'tag' => 'body', 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'some orphan text that needs t +o be in the output' }, 'Node' ), bless ( { 'tag' => 'use' 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'action/'}, 'Node +' ), bless ( { 'tag' => 'qd' 'data' => 'action' }, 'Node +' ), bless ( { 'tag' => 'orphan', 'data' => '/main.aXML' }, ' +Node' ) ] }, 'Node' ) bless ( { 'tag' => 'orphan', 'data' => 'some more orphan text' ), 'Node' ) ] }, 'Node' ) ] }, 'Node' ) ); sub getNodes { return @nodes; } 1;
      Have you considered leaving the untagged content as plain text?
      my @nodes = ( bless ( { 'tag' => 'html', 'data' => [ bless ( { 'tag' => 'head', 'data' => bless ( { 'tag' => 'title', 'data' => 'acme products' }, 'Node' ), bless ( { 'tag' => 'body', 'data' => [ 'some orphan text that needs to be in the + output', bless ( { 'tag' => 'use' 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'action/'}, 'Node +' ), bless ( { 'tag' => 'qd' 'data' => 'action' }, 'Node +' ), bless ( { 'tag' => 'orphan', 'data' => '/main.aXML' }, ' +Node' ) ] }, 'Node' ) 'some more orphan text', ] }, 'Node' ) ] }, 'Node' ) );
      and it may interest you that with Moose buildargs, you can easily set up the Node constructor to expect a tag and data, e.g. Node->new( qd => 'action' );. The output of Data::Dumper would still contain the bless { }, 'Node' syntax, making it a good place to do debugging and testing.
      my @nodes = ( Node->new( html => [ Node->new( head => Node->new( title => 'acme products' ), ), Node->new( body => [ 'some orphan text that needs to be in the output', Node->new( use => [ 'actions/', Node->new( qd => 'action'), '/main.aXML', ), 'some more orphan text', ], ), ] ), );
      but then why make nodes out of plain html if you have no action planned for them? Checking whether a tag is implemented during parsing is going to save you headaches later.
      my @nodes = ( '<html> <head><title>acme products</title></head> <body> some orphan text that needs to be in the output', Node->new( use => [ 'actions/', Node->new( qd => 'action' ), ' +/main.aXML' ] ), 'some more orphan text </body> </html>', )
      with which print @nodes would just do the right thing.

        That makes life a lot easier!

        The only caveat I can think of is the refas plugin ie :

        (refas tag="user")/path/to/user.xml(/refas) <p>Welcome back <user>username</user>, you were last here on : [time f +ormat="HH:MM:SS, DD/MM/YYY"]<user>lastvisit</user>[/time].</p>

        Where <user> is not a known tag until refas creates a definition for it which maps the user tag data to the nodes in the user.xml file, and [time] takes an integer and gives back a formatted date/time string.

        I was planning on expressing the difference between the three tag types by adding an attribute called aXML_class to the tags when converting them to standard XML :

        (SQL mode="mask") <query> SELECT username,email FROM users; </query> <mask> [link action="showuser" username="<d>username</d>" ]<d>username</d>[/link], [link to="mailto:<d>email</d>"]<d>email</d>[/link] <br> </mask> (/SQL) Becomes : <SQL aXML_class="primary" mode="mask"> <query> SELECT username,email FROM users; </query> <mask> <link aXML_class="tertiary" action="showuser" username="<d>username</d>" ><d>username</d></link>, <link aXML_class="tertiary" to="mailto:<d>email</d>" ><d>email</d></link> <br> </mask> </SQL> Also when tags have tags embedded in their attributes like this : <a b="<c>d</c>">data</a> converting the expression to XML like this; <a aXML_class="standard"> <attr>b="<c aXML_class="standard">d</c>"</attr> <contents>data</contents> </a>

        The examples above would map like this :

        <SQL aXML_class="primary" mode="mask"> <query> SELECT username,email FROM users; </query> <mask> <link aXML_class="tertiary" action="showuser" username="<d>usern +ame</d>"><d>username</d></link>, <link aXML_class="tertiary" to="mailto://<d>email</d>"><d>email< +/d></link> <br> </mask> </SQL> becomes : my @nodes = ( Node->new( SQL => { aXML_class => 'primary', attr => { mode => "mask" }, contents => { '<query>SELECT * FRO +M users</query> <mask>', [ Node->new( link + => { aXML_class => 'tertiary', + attr => { action => 'showuser', + username => '<d>username<d>' }, + contents => '<d>username</d>' + } ), + + contents => '<d>username</d>' } ), Node->new( link + => { aXML_class => 'tertiary', + attr => { to => 'mailto://<d>email</d>' }, + contents => '<d>email</d>' + } ), ], '<br></mask>' } } ) ); and <a b="<c>d</c>">data</a> becomes : <a aXML_class="standard"> <attr>b="<c aXML_class="standard">d</c>"</attr> <contents>data</contents> </a> then becomes : my @nodes = ( Node->new( a => { aXML_class => 'standard', attr => { b => Node->new ( c => +{ aXML_class => 'standard', + contents => 'd' + } ) }, contents => 'data' } } ) );
Re^2: rough start of an axml compiler
by Anonymous Monk on Aug 02, 2011 at 02:34 UTC
    Corion, muba, and a few others already explained this independent of each other, he is just playing dumb, you're feeding the troll
      Look, I can feed two at once!

        Look, I can feed two at once!

        Needs more catsup!