Using FastCGI

jonc has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have a very slow search engine made with a CGI script, and it has been suggested to use FastCGI. I am just wondering if where you put the while loop matters, and if there is a better way/other options.

I have looked at mod_perl, but can't put anything on the server, so it won't do. Not sure if Plack is a better option.

Also, would it matter using readdir or File::Find with speed? Would it be faster to use one file which includes all of the files being looped through??

Here is the important code (I think):

#!/usr/bin/perl -wT
#What is the T for in -wT?
use strict;
use CGI qw(:standard);
use FCGI;
use File::Find;
require '/Users/jon/Desktop/stanford-postagger-full-2011-04-20/verbTen
+seChanger.pl';


#my $search_key = param('query');
my $search_key = "move";

# --- Different forms of the Searchword --- #
my @temp_changeverbforms = map changeVerbForm( $search_key, 0, $_ ), 1
+..4;
my @verbforms;
push (@verbforms, $search_key); # Watch extra for loop
foreach my $temp_changeverbforms (@temp_changeverbforms)
    {
    push (@verbforms, $temp_changeverbforms) unless ($temp_changeverbf
+orms eq "");
    }

#my $c_verb = param('verb_only'); my $c_enabling = param('enabling'); 
+my $c_subject = param('subject');
#my $c_object = param('object'); my $c_prep = param('prep'); my $c_adj
+ = param('adjective');
#my $c_modnoun = param('modnoun');


my $category_id;

# --- Variables for required info from parser --- #

my $chapternumber; my $sentencenumber; my $sentence;
my $grammar_relation; my $argument1; my $argument2;

my @all_matches; ## RESULTS OF SEARCH

#if ($c_verb eq 'on')
#    {
#    if ($c_enabling eq 'on')
#        {$category_id = 'xcomp';}
#    elsif ($c_subject eq 'on')
#        {$category_id = 'subj';}
#    elsif ($c_object eq 'on')
#        {$category_id = 'obj';}
#    elsif ($c_prep eq 'on')
#        {$category_id = 'prep';}
$category_id = 'subj';



# --Files -- #
#To change, keep curly at beginning, comment File::Find ,) at end, unc
+omment open
    
## readdir ##    ##or  glob ##  ##or  File::Find ##
my $dir = '/Users/jon/Desktop/stanford-postagger-full-2011-04-20/';
opendir(my $dh, $dir) or die $!; # Use a lexical directory handle.
my @files = 
    grep { -f } #Just to check if file
    map  { "$dir/$_" } #Just to check if file
    grep { /^parsed.*\.txt$/ } #if parsed text
    readdir($dh);


 ################    FCGI attempt    ####################
while (FCGI::accept >= 0 ) {


for my $file (@files)
        {
        open(my $parse_corpus, '<', "$file") or die $!;
        my @entirechapter = <$parse_corpus>;
        
        my $entirechapter = join ('', @entirechapter); ##Flatten file 
+(make one big string)
        
        #To get each sent. and info in one string:
        my @sentblocks = split (/Parsing\s/, $entirechapter); ##Remove
+ "Parsing" which is on the line of the chptnumber
        
        $chapternumber = $1 if ($sentblocks[1] =~ /file:\s(\S+)\.txt/)
+;
        
        foreach my $sentblock (@sentblocks)
            {
            foreach my $verbform (@verbforms)
                {    
                ##blah blah regex's, conditions, subroutine
## output ##
print header();
print start_html();

##blah blah

    #print statements
    
    
    } #End of file loop Could be moved up for speed?
       
print "</ol><br>";
print end_html();
} ##END of FCGI while loop!
[download]

p.s. The while loop is after I get all the files I want to loop through.

Thanks a lot, and don't hesitate to let me know if this was asked badly, or unclear, or if I have to make it more concise.

Comment on Using FastCGI Download Code

Replies are listed 'Best First'.
Re: Using FastCGI by davido (Cardinal) on Jun 14, 2011 at 21:13 UTC
FastCGI can help a lot, but where it shines is when you have many requests over a short period of time (ie, high traffic). Every time a CGI script (done the old fashioned way) is invoked, the Perl interpreter fires up, loads all the modules, and runs your script. That startup time can be more significant than a trivial script. FastCGI improves this situation dramatically (as does mod_perl and webserver API integration). But there's only so much improvement you can get there. You next have to start looking at the algorithms. Anywhere you find yourself creating nested loops, or creating multiple sequential loops to deal with the same data set, you have to ask if there's a better way to do it. Profiling is a first step toward improving code already written, but even before the first step is planning and composing efficient code. I found a few areas where you could eliminate sequential loops, but I would need to know what goes on in the regexp-comparison loop to see if there's room for further efficiency improvements. And without profiling it's very difficult to know where to focus attention. Read more... (2 kB) This is untested since I don't know what to fill into the blanks. But it has eliminated a few either sequential or nested loops. See comments for indication of where loops have been refined. It's hard to know the impact it will have without knowing size of data sets, quantity of files handled, and so on, what is happening inside the regexp-matching loop, etc. But it could be a start in the right direction. By the way, the `perl -T` switch is for Taint Mode, which is described in brief in perlrun. Dave	[reply] [d/l] [select]
Re^2: Using FastCGI by jonc (Beadle) on Jun 15, 2011 at 04:40 UTC
Oh, well this was an amazing help. Should I include my full code? It seemed like a lot to post, so I omitted it. `grep { -f && /^parsed.\.txt$/` caused an error, so I used `grep { '-f' && /^parsed.\.txt$/}` with single quotes, until this is tested, I just won't check if it's a file. Your post definitely gave me hope that I can find a way to optimize the rest of the script. I was really impressed by the use of $/ to eliminate so much. I hope I can begin to think that way. It may help to amalgamate all the files into one before hand to avoid that one loop Thanks!	[reply] [d/l] [select]
Re^3: Using FastCGI by davido (Cardinal) on Jun 15, 2011 at 04:53 UTC
What error did it cause? I just tested the following code: `my @found = grep { -f && /\.pl/ } @array;` [download] ... and it did work. You are probably not invoking the script from within the working directory. If you have proper permissions, you can 'chdir', or you could specify the full path like this: `my @files = grep { -f && /^parsed.\.txt$/ } map { "$dir/$_" } readdir($dh);` [download] A subtle change. From an efficiency standpoint, slightly* worse, as map now acts on every item returned by readdir. But on the other hand, it works if your path is not your current directory. Besides, even if your directory has a thousand files in it, the map and grep aren't costing you much time. Again, this is where profiling comes in. :) Look at the CPAN module Devel::Profile. Dave	[reply] [d/l] [select]
Re^4: Using FastCGI by jonc (Beadle) on Jun 15, 2011 at 05:19 UTC
Re^5: Using FastCGI by davido (Cardinal) on Jun 15, 2011 at 05:30 UTC
Re: Using FastCGI by moritz (Cardinal) on Jun 14, 2011 at 19:45 UTC
If you want to speed up a script, don't make changes based on hearsay. Use a profiler (like Devel::NYTProf) to find the things that are slow, and try to improve those. I guess in your case it's reading the files that is slow, but don't rely on my guessing - do the analysis yourself. For searching things, there are various fast, pre-made solutions. For example KinoSearch lets you build a so-called "inverted index" from documents in which searching for words is much faster than reading all the documents when you want to search them. Perl 6 - second systems done right	[reply]
Re^2: Using FastCGI by jonc (Beadle) on Jun 15, 2011 at 02:18 UTC
Thanks a lot, I had never heard of this 'profiling', should definitely get me started in the right direction. I was thinking of eliminating the loop to read files, by just joining all the files together, so we'll see.	[reply]
Re: Using FastCGI by sundialsvc4 (Abbot) on Jun 14, 2011 at 20:05 UTC
I agree. If you are faced with a quantifiable problem, then that problem must have a quantifiable solution, and you should strictly limit yourself to that solution ... nothing more. “Right now, you are barking up the wrong tree.” If the fundamental statement of your problem is that “you have a slow search engine,” then it probably does not matter in the slightest how that “too-slow search engine” is presently being invoked. Therefore, changes to “how it is being invoked (and nothing more...)” probably won’t help the problem in the slightest. What you need to focus your (full...) attentions on is this: “why is my search-engine slow?” Sure... when you come to that... “Plack is great.” But, “(Plack \| CGI \| FastCGI \| whatever) is merely a subroutine-call.” If your fundamental algorithm can’t produce the expected results in a reasonable amount of time, then it really does not matter how it gets called.
Re^2: Using FastCGI by jonc (Beadle) on Jun 15, 2011 at 02:21 UTC
Sorry, I was a little lost as to what these speed improvements did, guess I was being a too optimistic that it would fix the problem. I will get a more efficient algorithm and then probably come back for further help. Thanks a lot.	[reply]
Re: Using FastCGI by Anonymous Monk on Jun 14, 2011 at 22:08 UTC
The idea is, if you know your program is slow, return a progress report to the user until the search is complete `searching 1 searching 2 searching 3 searching 4 done, see your results` [download] See Watching long processes through CGI (Aug 02) for a technique	[reply] [d/l]


Syntactic Confectionery Delight
	PerlMonks