Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Node Parser too slow?

by BioHazard (Pilgrim)
on Nov 04, 2002 at 16:07 UTC ( #210206=perlquestion: print w/ replies, xml ) Need Help??
BioHazard has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

At present I am creating a web-based intranet post/reply system which will use an amended node syntax of the type showing by "displaytype=xml".
Below there is an example node and the method which is used for parsing these nodes.
As you can see whenever there is a "child" detected, a new instance of the sub parseNode is called and a template output will be returned. At the very end, all nodes (main node and children) are parsed "together" in the correct order and the HTML Output can be printed.
But now I am not sure if those new instances of parseNode() will be too slow (i.e. a post with 25-30 replies). I tried to return only the template object to save a bit of RAM but that doesn't seem to do the trick. By now I do not have the possibility to make an intensive test of my script but I hope it will be fast enough at least in the local area network.

Do you have any suggestions how to integrate the children-nodes more elegantly?

Thank you for your help an sorry about my bad English.

P.S.: I am using XML::Simple and HTML::Template.
sub parseNode { my $node_id = shift(); my $node = XMLin( "$node_id.xml", searchpath => ['nodes'], forcear +ray => ['field', 'child'] ); my $template = HTML::Template->new( filename => "$node->{type}{id} +.html" ); $template->param( node_id => $node->{id}, type => $node->{type}{content}, type_id => $node->{type}{id}, title => $node->{title}, created => &parseTimeBySeconds($node->{created}), updated => &parseTimeBySeconds($node->{updated}), owner => $node->{owner}{content}, owner_id => $node->{owner}{id} ); foreach ( keys( %{$node->{data}{field}} ) ) { $template->param( $_ => $node->{data}{field}{$_}{content} ); } my $children = ''; foreach ( keys( %{$node->{data}{child}} ) ) { $children = $children . &parseNode($node->{data}{child}{$_}{co +ntent}); } $template->param( children => $children ); return $template->output(); }
Here's the node:
<?xml version="1.0" encoding="ISO-8859-1" ?> <node id="432" title="Title of page" created="1001562046" updated="100 +2582217"> <type id="1">Thread</type> <owner id="31337">BioHazard</owner> <data> <field name="doctext">This is test content</field> <child id="1">433</child> <child id="2">435</child> </data> </node>

reading between the lines is my real pleasure

Replies are listed 'Best First'.
Re: Node Parser too slow?
by grantm (Parson) on Nov 05, 2002 at 07:10 UTC

    If you're using XML::Simple and it's slow, the first thing to check is what parser you're using. If you have version 1.08_01 of XML::Simple and you have XML::SAX installed then you may using XML::SAX::PurePerl which is slow. Look for ParserDetails.ini in the site_lib/XML/SAX directory. All your installed SAX parsers are listed in this file and the last one listed will be used by default. If you don't already have XML::LibXML installed then that's the fastest XML parser available and XML::Simple will use it if you have SAX installed.

    I'd also suggest a couple of tweaks to your code to squeeze a little more efficiency out of it. Both the <field> and <child> elements are being folded into hashes (on the 'name' and 'id' attributes respectively). You are then processing every element in each of these hashes by iterating over keys. A more efficient way would be to set the option keyattr => {} on the call to XMLin() to disable folding. Then you can iterate through the resulting arrays (rather than hashes) with:

    foreach ( @{$node->{data}{field}} ) { $template->param( $_->{name} => $_->{content} ); }

    For maximum efficiency, you'll want to ditch XML::Simple altogether and refactor your code into a SAX handler.

Re: Node Parser too slow?
by joe++ (Friar) on Nov 04, 2002 at 16:59 UTC
    Hi BioHazard,

    From what I understand from the snippet, your script takes a "node" (xml file), renders html based on its content and recursively calls itself for every "child" element encountered (the child element references another xml file).

    At the end, your sub returns a flattened HTML string (if I interpret HTML::Template's output() method correctly). Both the XML::Simple object and HTML::Template object go out of scope.

    That means, that during the recursive process the max. amount of memory depende on the total length of the outputted HTML string, together with (the recursion depth) x ((the memory needed for a XML::Simple instance) + (a HTML::Template instance)). From your example XML snippet, this looks not like a big deal. I'm not familiar with XML::Simple's internals, but being based on a SAX parser (expat) I guess that after parsing mostly the Hash of Hashes/Lists of the $node object remains.

    Whether this implementation is too slow depends largely on the nesting depth and your definition of "slow", but in order to save on both memory and CPU, you could conceivably render these discussion threads to static files in a background process, on regular intervals or as soon as a new item has been added to the discussion. Or even on first request, from a nifty 404 error handler ;-)

    Just my $0.02

    Cheers, Joe

Re: Node Parser too slow?
by gjb (Vicar) on Nov 04, 2002 at 19:28 UTC

    This doesn't answer your question directly, but I think it is relevant nevertheless.

    Actually, this is a job for XSLT, the XML transformation language. The idea is that one describes a transformation of XML into "something else" which can be either XML, HTML, plain text, basically whatever you like.

    If the XSLT modules on CPAN are worth their salt, they should be quite efficient, so maybe this is an occasion not to reinvent the wheel ;-)

    XSLT is a W3C standard since a couple of years now, so the technology is quite mature.

    Hope this helps, -gjb-

      My XSLT experience is fairly light so I'm keen to find out about real world applications. In the original poster's code he would encounter an element like this:

      <child id="2">435</child>

      At this point his code would then (recursively) parse down into the file called '435.xml'. I realise recursive processing is no problem for XSLT, but how would you load in another file and process it?

        Oops, I should have read the code more carefully. I thought it was just an XML file that was converted to HTML without recursion.

        Recursion complicates matters to the point that you have to do major hacking in order to get this to work with XSLT if it is possible at all.

        Thanks for pointing this out, -gjb-

Re: Node Parser too slow?
by BioHazard (Pilgrim) on Nov 08, 2002 at 11:57 UTC
    Hi Monks,

    Thank you for your suggestions, I am now trying to create a static HTML File of a thread, whenever a comment is written. This has the advantage not to parse all XML Files each time someone wants to read the thread and might be a bit faster. Thanks to grantm, I am going to use the algorithm without the keys() function now.

    Perhaps some of you are willing to test my system when I'm a bit further :)

    reading between the lines is my real pleasure

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://210206]
Approved by thraxil
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2016-06-30 17:11 GMT
Find Nodes?
    Voting Booth?
    My preferred method of making French fries (chips) is in a ...

    Results (399 votes). Check out past polls.