Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

rough start of an axml compiler

by Logicus
on Jul 21, 2011 at 06:19 UTC ( #915797=perlquestion: print w/ replies, xml ) Need Help??
Logicus has asked for the wisdom of the Perl Monks concerning the following question:

Ok here is a very rough and ugly start to how I think your suggesting I should go about compiling aXML files. I know this code is _AWFUL_, but it does work and it's quite fast at turning an aXML file into a bunch of print statements and axml function calls which could be very quickly parsed by my existing parser system as they are very short. I guess I would want to save the output of this program on first run so I can skip this step on subsequent runs. This only supports <> type tags.

use Modern::Perl; use Time::HiRes qw( gettimeofday tv_interval ); my $start = [ gettimeofday ]; my $aXML_ENV = { qd => { a => 'b', b => 'c', c => '42' }, conf => { x => 'y', foo => '1', bar => '2' } }; my $plugins = { qd => '$result = $aXML_ENV->{"qd"}->{$data}', conf => '$result = $aXML_ENV->{"conf"}->{$data}' }; my $aXML_string = qq@ some other data that needs to be output as well <qd><qd><qd>a</qd></qd></qd> = 42 <conf>bar</conf> <sometag>thatdoesnothing</sometag> but also needs to be present in the output <conf>foo</conf> <someothertag>thatalsodoesnothing</someothertag> as far as aXML is concerned, but obviously used by whatever program we are sending this data too. @; sub sortofcompileit { my $aXML_ENV = $_[0]; my $plugins = $_[1]; my $aXML_string = $_[2]; my $compiled_string_start; my $compiled_string_end; my $compiled_string_middle; my @commands; my $command_opens_string = "("; my $command_closes_string = "("; my $mong_string; while ( my ($key, $value) = each(%$plugins) ) { push (@commands, $k +ey); } map { $command_opens_string .= "<$_>|" } @commands; map { $command_closes_string .= "</$_>|" } @commands; chop $command_opens_string; chop $command_closes_string; $command_opens_string .= ")"; $command_closes_string .= ")"; #find the position of the first command #set everything before it to be printed if ($aXML_string =~ m@^(.*?)$command_opens_string@s) { $compiled_string_start = 'print qq@'; $compiled_string_start .= $1; $compiled_string_start .= "@;\n\n"; } $mong_string = 'use aXML;'; $mong_string .= "\n"; $mong_string .= $compiled_string_start; #find everything in the middle if ($aXML_string =~ m@$command_opens_string(.*)$command_closes_stri +ng@s) { $compiled_string_middle = "<axml>$1$2$3</axml>"; } #find anything in the middle which is inbetween any type of close #and open and set it to be printed out my $replacement; $compiled_string_middle =~ s@$command_closes_string@`$1@gs; while ($compiled_string_middle =~ m@(.*?)$command_closes_string([^` +]*?)$command_opens_string@gs) { $replacement = "$1$2</axml>\n\n"; $replacement .= 'print qq@'; $replacement .= $3; $replacement .= "@;\n\n<axml>$4"; $mong_string .= $replacement; } $compiled_string_middle =~ s@`@@gs; if ($compiled_string_middle =~ m@.*$command_opens_string(.*?)$comma +nd_closes_string</axml>$@s) { $mong_string .= "$2$3</axml>\n\n"; } #find the position of the last close tag #set everything after it to be printed if ($aXML_string =~ m@.*$command_closes_string(.*)$@s) { $compiled_string_end = 'print qq@'; $compiled_string_end .= $2; $compiled_string_end .= '@;'; } $mong_string .= $compiled_string_end; $mong_string =~ s@`@@gs; $mong_string =~ s/<axml>(.*?)<\/axml>/print axml\(qq\@$1\@\);/g; return $mong_string; } my $sortofcompiled_string = sortofcompileit($aXML_ENV,$plugins,$aXML_s +tring); say $sortofcompiled_string; my $end = [ gettimeofday ]; my $total_elapsed = tv_interval($start,$end); say "elapsed = $total_elapsed";

Comment on rough start of an axml compiler
Download Code
Re: rough start of an axml compiler
by Logicus on Jul 21, 2011 at 10:15 UTC

    The thing I don't like about this sort of approach, even if it was done with a lovely modern H.O.P lexer or whatever, is that it fundamentally breaks several of my plugins which exploit the runtime parsing engine to do cool things like the refas.

    I really can't be bothered carrying on with this, you can moan about how wasteful of processor time my method is all you want, and how it can't be streamparsed, and how its a non-context-grammar-free-lispish-whatever... it doesn't detract from the coolness of aXML and how quick and easy it is to make complex sites with. I just can't be bothered trying to shave a few more clock ticks from the parser or write a compiler which breaks the way it works especially when the database connection even using Apache::DBI configured in the httpd.conf file takes far more processor time up than the parsing.

    Bottom line is that this system is not that slow, it can do 30 requests a second running from a 5 year old 1.8ghz laptop, apache2 can use multi-cores, and I am confident that moores law will continue unabated for at least another 20 years so all in all I think my time would be better spent writing cool new plugins and deploying new sites with the existing parsing system that works more or less exactly as it did way back in 2007.

    aXML and my parser are completely free for anyone to download and use for whatever they want to use it for and has been available, almost completely unnoticed, right here on perlmonks for the last 4 years.

    If anyone feels like packaging it up for putting on CPAN then go right ahead, I have more interesting things to be doing :)

      is that it fundamentally breaks several of my plugins

      Plugins shouldn't change the syntax of the language, just the functionality.

      If you follow that principle, the parser is completely independent of the plugins and therefore cannot break them.

      If you don't follow that principle, you're in for a world of hurt.

      Update: "don't" was missing in last sentence.

        Traditionally that is true, but aXML is not traditional!

        Consider this :

        (refas tag="user")/path/to/userfile.xml(/refas) <table> <tr> <td>username : <user>username</user></td> <td>first name : <user>first_name</user></td> <td>surname : <user>sur_name</user></td> </tr> </table>

        The refas (short for refer as) plugin builds a new plugin on the fly called "user", which extracts values from the XML file and substitutes them in the document wherever the tag exists.

        The compiler would split the document up like this :

        print axml(qq@(refas tag="user")/path/to/userfile.xml(/refas)@); print qq@ <table> <tr> <td>username : <user>username</user></td> <td>first name : <user>first_name</user></td> <td>surname : <user>sur_name</user></td> </tr> </table> @;

        Bang! refas doesn't work anymore! The compiler doesn't know what a <user> tag is and ignores it alongside things like <table> and <tr>, and the <refas> tag doesn't have an output other than to modify the parser runtime variables to understand what a <user> tag is.

        The code which would work under a compiler would have to look something like this :

        (refas path="/path/to/userfile.xml") <table> <tr> <td>username : <d>username</d></td> <td>first name : <d>first_name</d></td> <td>surname : <d>sur_name</d></td> </tr> </table> (/refas)

        But that is not necessarily the best way to lay the code out from an design perspective, especially if one of the <user> tags needed to be elsewhere in the document. Also the refas plugin would then also have to take input data to be modified and give an ouput, or you would have to scrap the refas tag and create a plugin called user. Thus the very act of compilation is placing constraints on the code which I designed the parser over successive code iterations to overcome/eliminate.

      I really don't understand what you are trying to accomplish at PerlMonks. You came here with your templating system saying it is the greatest thing in the world but it is coded poorly and it is too slow and how can it be fixed.

      You got dozens of responses with good, solid ideas on how to fix the issues and make it faster. Ideas which you ignored and argued against without taking the time to understand them.

      Now you are saying that there was nothing wrong with the code in the first place and that it was always fast enough. And you go on below to say that all the people who know how to write efficient code to solve your problem are chained to antiquated ideas, that aren't needed for your "magic" solution.

      And to top it all off, your final word is that your system is the best thing ever, but it is someone else's job to put it on CPAN.

      So what have you actually accomplished here other than to annoy people, make them waste their time trying to help you (and belittle them with profanity and insults when they do try to help), and fill up the worst nodes lists? You certainly haven't learned anything, not have you contributed in any way. I just hope that the "more interesting things" you are going to do will be done elsewhere. I for one won't miss you and I doubt anyone else will either



      -pete
      "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."

        Arghhhhh!!!!!!



        I really don't understand what you are trying to accomplish at PerlMonks. You came here with your templating system saying it is the greatest thing in the world but it is coded poorly and it is too slow and how can it be fixed.

        I've said this lots of times, it's not a templating system, its an abstraction method which can be used to build a templating system within which the programmer is completely free to define their own syntax unlike existing templating system where definitions are set in stone. As a result there is only one language involved, Perl, instead of things like TT2 which are one language sandwiched ontop of another language.

        What I am trying to accomplish is the sharing of a very neat abstraction system which I have found to bring huge savings of time and effort in everyday work.



        You got dozens of responses with good, solid ideas on how to fix the issues and make it faster. Ideas which you ignored and argued against without taking the time to understand them.

        I didn't ignore them, I do understand them I just don't agree that they are applicable/suitable, and or I feel they place too many constraints on what is currently a highly dynamic and totally flexible system.



        Now you are saying that there was nothing wrong with the code in the first place and that it was always fast enough.

        Fast enough is a function of how big your server budget is and the current state of hardware availability. In 2007 when I first came here looking for information on how to improve the processing speed, the efficiency problem was much greater than it is now, and much greater than it will be 4 or 8 years from now.



        And you go on below to say that all the people who know how to write efficient code to solve your problem are chained to antiquated ideas, that aren't needed for your "magic" solution.

        Sorry! not sorry... I admit my solution is not as efficient as it could be, however I have yet to come across a solution to that problem and I have looked!



        And to top it all off, your final word is that your system is the best thing ever, but it is someone else's job to put it on CPAN.

        It is pretty awesome, as you would know if you actually understood what it does. As for CPAN, there is no advantage whatsoever to me personally from bundling it up on CPAN because I already have it installed on all my systems. Putting it on CPAN will only help make it easier to share with others, and if no one wants it then why would I bother?



        So what have you actually accomplished here other than to annoy people, make them waste their time trying to help you

        People need annoying from time to time, keeps them on their toes.



        (and belittle them with profanity and insults when they do try to help),

        If you can link to any profanity I have uttered, you will also find in direct one to one correspondance the rude monk who provoked it as an equal and opposite reaction. I don't start arguments, but I do give as good as I get.



        and fill up the worst nodes lists? You certainly haven't learned anything, not have you contributed in any way.

        I've learnt a few things. aXML is my contribution, just because you don't know what it is or why I want to contribute it doesn't mean it has no value.



        I just hope that the "more interesting things" you are going to do will be done elsewhere. I for one won't miss you and I doubt anyone else will either

        Love you too.

Re: rough start of an axml compiler
by pemungkah (Priest) on Jul 21, 2011 at 16:48 UTC
    The only comment I might make is that anytime you have a function you're passing craploads of parameters to, that you may be in a position where you'd be better served by using an object to manage the storage. Internally (in methods inside the class), you can go ahead and keep referencing things directly for speed.

    If I understand your architecture properly, you have several invariants that are getting set up in sortofcompileit on every call; if you pulled those out into package variables and set them up once (with an init method/sub), you'd save time on every subsequent call.

    And this it totally headed in a good direction - real "compilation" of the aXML code! If you memoized (cf. Memoize) the calls to sortofcompile, you might be able to get another free speedup from Memoize's caching. As long as a given parameter set always results in the same output, Memoize will help. If there are side effects that might change the result, then it won't help (e.g., memoizing a random number generator would make it seriously unusable, if very fast!).

      The idea was to run sortofcompileit only once per page, the first time it is accessed and to save it's output so that henceforth you can skip that step unless the source-code has been updated. All it does is reorganise the raw aXML code into a more efficient layout which can be processed a lot faster for individual page hits.

      I don't like it because that method, no matter how cleverly implemented breaks certain plugins which are designed to exploit the runtime parsing setup.

      If you wanted to use aXML for a large scale site and server overhead was a real budgeting concern then it would be neccesary to sacrifice said plugins (and the groovy effects they achieve), in order to run a compilation/optimisation schema like what the above code is starting to do.

      TIMTOWTDI even with aXML/Perl

        Just as a throw-it-out-there, how about adding markup (or detecting, depending on how sophisticated you want to be) that delineates the "definitely dynamic" and "for-sure static" portions of a page? You could pre-build whatever was invariant (I seem to be using that word a lot lately...) and reserve the slower dynamic stuff for just the part(s) that needed it.
Re: rough start of an axml compiler
by Boldra (Deacon) on Aug 01, 2011 at 12:02 UTC
    You say in your "offtopic epiphony" that you believe
    <<a>b</a>>c</<a>b</a>>
    to be unrepresentable in any kind of data structure, perl or otherwise. Here's a simple solution:
    my @nodes = ( bless( { 'data' => 'c', 'tag' => bless( { 'data' => 'b', 'tag' => 'a' }, 'Node' ) }, 'Node' ) );
    The definition of the action to be performed on data 'c' is postponed until operation 'a' is performed on data 'b'.

    I think you wrote a parser already, so I'm sure you can adapt it to produce a structure like above. Once you have the structure, generating the output is also straightforward:

    package Node; use Moose; has [ qw<data tag> ] => ( is => 'rw', isa => 'Any' ); sub as_text { my ($self) = shift; my $tag = $self->tag; my $tag_processing_method = ref $tag ? $tag->as_text : $tag; return $self->$tag_processing_method( $self->data ); } # Tag Processing Methods here: sub a { "super_$_[1]" } # prepend "super_" sub b { "b_$_[1]" } # prepend "b_" sub super_b { "B_$_[1]" } # prepend "B_"
    If you run it (say for map { $_->as_text } @nodes), you'll see that instead of sub b being called, super_b is called.

    I'd be very inclined to add string overloading to the Node package so:

    use overload q{""} => 'as_text', fallback => 1, ;
    which could make the calls even simpler, (with a possible cost to debugging and maintainability). as_text becomes
    sub as_text { my ($self) = shift; my $processing_method = $self->tag; return $self->$processing_method( $self->data ); }
    and generating output once you have your @nodes array is simply stringification. print @nodes;

    update fixed some typos

      I said any kind I know of, but then I am renowned for being an uneducated thick-wit who won't listen to advice of my elders and betters.

      I'm going to have to have a good think about what you've put there above. Digestion should be complete in a few days, before which any comment I make will probably be seen as another example of my stupidity.

      The first thing that is running through the vacuous hole I refer to sometimes laughingly as my brain, is how to decompress this :

      my @nodes = ( bless( { 'data' => 'c', 'tag' => bless( { 'data' => 'b', 'tag' => 'a' }, 'Node' ) }, 'Node' ) );

      From the source;

      <<a>b</a>c</<a>b</a>>

      I have a pathological aversion to all things OOP, but the apparent simplicity of what you have shown above is strangely appealing. Thanks!

      Corion, muba, and a few others already explained this independent of each other, he is just playing dumb, you're feeding the troll
        Look, I can feed two at once!

      Well Boldra, you've thrown a proper little spanner into my works... I'm not complaining because I really like your example!

      I was going to run a small number of regex conversions on an aXML string and turn it into classic XML to feed XML::Simple for turning into a perl structure, but I can't do that now if I want to use the method above. .o0(~Hrm~)

      One quick question though, under this schema would every tag have to have a definition? As in what would happen to tags which are just markup around and within tags which have defined roles?

      Also there is another thought that I don't know exactly how to describe I guess you could call it orphan data, for example:

      listing actions/default/body.aXML --------------------------------- <html> <head><title>acme products</title></head> <body> some orphan text that needs to be in the output <use>actions/<qd>action</qd>/main.aXML</use> some more orphan text </body> </html>

      I'm guessing that the above would be mapped to your moose solution thusly:

      package actions::default::body; my @nodes = ( bless ( { 'tag' => 'html', 'data' => [ bless ( { 'tag' => 'head', 'data' => bless ( { 'tag' => 'title', 'data' => 'acme products' }, 'Node' ), bless ( { 'tag' => 'body', 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'some orphan text that needs t +o be in the output' }, 'Node' ), bless ( { 'tag' => 'use' 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'action/'}, 'Node +' ), bless ( { 'tag' => 'qd' 'data' => 'action' }, 'Node +' ), bless ( { 'tag' => 'orphan', 'data' => '/main.aXML' }, ' +Node' ) ] }, 'Node' ) bless ( { 'tag' => 'orphan', 'data' => 'some more orphan text' ), 'Node' ) ] }, 'Node' ) ] }, 'Node' ) ); sub getNodes { return @nodes; } 1;
        Have you considered leaving the untagged content as plain text?
        my @nodes = ( bless ( { 'tag' => 'html', 'data' => [ bless ( { 'tag' => 'head', 'data' => bless ( { 'tag' => 'title', 'data' => 'acme products' }, 'Node' ), bless ( { 'tag' => 'body', 'data' => [ 'some orphan text that needs to be in the + output', bless ( { 'tag' => 'use' 'data' => [ bless ( { 'tag' => 'orphan', 'data' => 'action/'}, 'Node +' ), bless ( { 'tag' => 'qd' 'data' => 'action' }, 'Node +' ), bless ( { 'tag' => 'orphan', 'data' => '/main.aXML' }, ' +Node' ) ] }, 'Node' ) 'some more orphan text', ] }, 'Node' ) ] }, 'Node' ) );
        and it may interest you that with Moose buildargs, you can easily set up the Node constructor to expect a tag and data, e.g. Node->new( qd => 'action' );. The output of Data::Dumper would still contain the bless { }, 'Node' syntax, making it a good place to do debugging and testing.
        my @nodes = ( Node->new( html => [ Node->new( head => Node->new( title => 'acme products' ), ), Node->new( body => [ 'some orphan text that needs to be in the output', Node->new( use => [ 'actions/', Node->new( qd => 'action'), '/main.aXML', ), 'some more orphan text', ], ), ] ), );
        but then why make nodes out of plain html if you have no action planned for them? Checking whether a tag is implemented during parsing is going to save you headaches later.
        my @nodes = ( '<html> <head><title>acme products</title></head> <body> some orphan text that needs to be in the output', Node->new( use => [ 'actions/', Node->new( qd => 'action' ), ' +/main.aXML' ] ), 'some more orphan text </body> </html>', )
        with which print @nodes would just do the right thing.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://915797]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2014-11-23 05:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (128 votes), past polls