Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Why so slow from CGI, but not command line?

by ajt (Prior)
on Apr 22, 2002 at 10:47 UTC ( #161028=perlquestion: print w/ replies, xml ) Need Help??
ajt has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I'm currently doing some RDF/RSS scripting. I've written this very simple script that takes an RSS file or URI, loads it in or gets it from the web server, converts all files to RSS 0.91 format (so the XSL-T is easier), then loads the file into Matts excellent LibXML, and the XSL-T file into LibXSLT, and finally dumps the output.

Called from the command line with local RSS and XSL files, it takes about 2 seconds on a low end NT box to run. The same script called via Apache/CGI on the same machine takes a LOT longer, about 4 minutes.

I put some print statments and an exit and it seems that the XML Parser is taking for ever to run when called via CGI, the rest is done in less than 1 second. I've done XSL-T using this module before on the same box via CGI and it ran fine before.

The RSS files are short, the XSL-T file is short and simple, and it runs fine from the command line, what am I doing that's so stupid as to screw it up under CGI?

Code Snip: uses strict, CGI, LibXML, LibXSLT etc etc

$rss->parse($xml); $rss->{output} = '0.91'; $xml=$rss->as_string; my $xslt = XML::LibXSLT->new(); my $xml_parser = XML::LibXML->new(); $xml_parser->keep_blanks(0); $xml_parser->load_ext_dtd(0); my $source_xml = $xml_parser->parse_string($xml); my $style_xsl = $xml_parser->parse_file($style); my $stylesheet = $xslt->parse_stylesheet($style_xsl); my $result_xml = $stylesheet->transform($source_xml); print $stylesheet->output_string($result_xml);
Here is an examle RSS file:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="C:\web\home\xml\xsl\rss-rdf.xsl +"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns +="http://my.netscape.com/rdf/simple/0.9/"> <channel> <title>freshmeat.net</title> <link>http://freshmeat.net</link> <description>the one-stop-shop for all your Linux softwar need +s</description> </channel> <image> <title>freshmeat.net</title> <url>http://freshmeat.net/images/fm.mini.jpg</url> <link>http://freshmeat.net</link> </image> <item> <title>Geheimnis 0.59</title> <link>http://freshmeat.net/news/1999/06/21/930004162.html</lin +k> </item> <item> <title>Firewall Manager 1.3 PRO</title> <link>http://freshmeat.net/news/1999/06/21/930004148.html</lin +k> </item> <textinput> <title>quick finder</title> <description>Use the text input below to search the fresh meat application database</description> <name>query</name> <link>http://core.freshmeat.net/search.php3</link> </textinput> </rdf:RDF>
Here is a simple XSL-T:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" exclude-result-prefixes="xsl" > <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/"> <div> <xsl:apply-templates select="rss"/> </div> </xsl:template> <xsl:template match="rss"> <xsl:apply-templates select="channel"/> </xsl:template> <xsl:template match="channel"> <xsl:variable name="link" select="link"/> <xsl:variable name="description" select="description"/> <h1><a href="{$link}" title="{$description}"><xsl:value-of select= +"title" /></a></h1> <hr /> <ul><xsl:apply-templates select="item"/></ul> </xsl:template> <xsl:template match="item"> <xsl:variable name="item_link" select="link"/> <xsl:variable name="item_title" select="description"/> <li><a href="{$item_link}" title=""><xsl:value-of select="title"/> +</a></li> </xsl:template> </xsl:stylesheet>

Comment on Why so slow from CGI, but not command line?
Select or Download Code
Re: Why so slow from CGI, but not command line?
by kappa (Chaplain) on Apr 22, 2002 at 11:39 UTC
    Could this be a problem? Try using a http-url as a value for href.
    <?xml-stylesheet type="text/xsl" href="C:\web\home\xml\xsl\rss-rdf.xsl +"?>
    Update: ajt tried it, and it doesn't work.
Re: Why so slow from CGI, but not command line?
by Sifmole (Chaplain) on Apr 22, 2002 at 12:26 UTC
    I can't tell from the code snippet that you posted, are you sending the file from the browser computer to the server? or is the file being processed coming from the server computer? If you are sending a file that will increase your total time because of the time to transfer the file. Would it account for four minutes.. well that would depend on the size of the file and your transfer rate. Also, and this is just a reach, you might try setting $|=1; Perhaps it is just taking the browser some time to realize that the CGI is done.
Re: Why so slow from CGI, but not command line?
by ajt (Prior) on Apr 22, 2002 at 12:46 UTC
    The solution is.....

    Window NT doing strange things! When I wrote the script originally I ran it at home, however at work where I started to test it this morning, we have a Filewall and MS Proxy...

    Even though the MS winsock proxy is supposed to deal with things transparently, it clearly doesn't work for processes running under CGI, probably a user/permission problem. Anyway when I run it under CGI Matt's LibXML waits around trying to access the external DTD, which it can't, so it gives up eventually - hence the minutes of waiting. If you remove the DTD then it runs just as fast as it does on the command line.

    I now have to remove the DTD from the input XML files or make Matt's LibXML parser ignore the DTD (which is what I thought $xml_parser->load_ext_dtd(0); did), but I was wrong.

    Many thanks to kappa, though it was the wrong suggestion, it did made me think external, and hence track down the answer!

    I should have know that if it runs okay from the command line, but not CGI it almost certainly is a permissions problem....!

    As ever thanks in advance......!

      Heh, my first thought was about fetching DTD or something :) But you almost clearly state that you run Apache on the same machine! Ah, ok, I am glad you solved your problem!
      And Windows NT looks innocent in this case, doesn't it? :)))

      As a handy debugging tip, any time you have code that's doing something `networky' (name resolution, creating a TCP connection) that mysteriously hangs for ~1-2 minutes it's often fruitful to use strace (well, not on NT of course :) or a packet sniffer (tcpdump, ethereal) to watch what exactly your code is doing when you see it hang (i.e. "Hrrm, it's sent a packet to the name server and it's now stuck . . .").

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://161028]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2014-04-17 02:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (437 votes), past polls