Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Using FastCGI

by jonc (Beadle)
on Jun 14, 2011 at 19:02 UTC ( [id://909628]=perlquestion: print w/replies, xml ) Need Help??

jonc has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have a very slow search engine made with a CGI script, and it has been suggested to use FastCGI. I am just wondering if where you put the while loop matters, and if there is a better way/other options.

I have looked at mod_perl, but can't put anything on the server, so it won't do. Not sure if Plack is a better option.

Also, would it matter using readdir or File::Find with speed? Would it be faster to use one file which includes all of the files being looped through??

Here is the important code (I think):

#!/usr/bin/perl -wT #What is the T for in -wT? use strict; use CGI qw(:standard); use FCGI; use File::Find; require '/Users/jon/Desktop/stanford-postagger-full-2011-04-20/verbTen +seChanger.pl'; #my $search_key = param('query'); my $search_key = "move"; # --- Different forms of the Searchword --- # my @temp_changeverbforms = map changeVerbForm( $search_key, 0, $_ ), 1 +..4; my @verbforms; push (@verbforms, $search_key); # Watch extra for loop foreach my $temp_changeverbforms (@temp_changeverbforms) { push (@verbforms, $temp_changeverbforms) unless ($temp_changeverbf +orms eq ""); } #my $c_verb = param('verb_only'); my $c_enabling = param('enabling'); +my $c_subject = param('subject'); #my $c_object = param('object'); my $c_prep = param('prep'); my $c_adj + = param('adjective'); #my $c_modnoun = param('modnoun'); my $category_id; # --- Variables for required info from parser --- # my $chapternumber; my $sentencenumber; my $sentence; my $grammar_relation; my $argument1; my $argument2; my @all_matches; ## RESULTS OF SEARCH #if ($c_verb eq 'on') # { # if ($c_enabling eq 'on') # {$category_id = 'xcomp';} # elsif ($c_subject eq 'on') # {$category_id = 'subj';} # elsif ($c_object eq 'on') # {$category_id = 'obj';} # elsif ($c_prep eq 'on') # {$category_id = 'prep';} $category_id = 'subj'; # --Files -- # #To change, keep curly at beginning, comment File::Find ,) at end, unc +omment open ## readdir ## ##or glob ## ##or File::Find ## my $dir = '/Users/jon/Desktop/stanford-postagger-full-2011-04-20/'; opendir(my $dh, $dir) or die $!; # Use a lexical directory handle. my @files = grep { -f } #Just to check if file map { "$dir/$_" } #Just to check if file grep { /^parsed.*\.txt$/ } #if parsed text readdir($dh); ################ FCGI attempt #################### while (FCGI::accept >= 0 ) { for my $file (@files) { open(my $parse_corpus, '<', "$file") or die $!; my @entirechapter = <$parse_corpus>; my $entirechapter = join ('', @entirechapter); ##Flatten file +(make one big string) #To get each sent. and info in one string: my @sentblocks = split (/Parsing\s/, $entirechapter); ##Remove + "Parsing" which is on the line of the chptnumber $chapternumber = $1 if ($sentblocks[1] =~ /file:\s(\S+)\.txt/) +; foreach my $sentblock (@sentblocks) { foreach my $verbform (@verbforms) { ##blah blah regex's, conditions, subroutine ## output ## print header(); print start_html(); ##blah blah #print statements } #End of file loop Could be moved up for speed? print "</ol><br>"; print end_html(); } ##END of FCGI while loop!

p.s. The while loop is after I get all the files I want to loop through.

Thanks a lot, and don't hesitate to let me know if this was asked badly, or unclear, or if I have to make it more concise.

Replies are listed 'Best First'.
Re: Using FastCGI
by davido (Cardinal) on Jun 14, 2011 at 21:13 UTC

    FastCGI can help a lot, but where it shines is when you have many requests over a short period of time (ie, high traffic). Every time a CGI script (done the old fashioned way) is invoked, the Perl interpreter fires up, loads all the modules, and runs your script. That startup time can be more significant than a trivial script. FastCGI improves this situation dramatically (as does mod_perl and webserver API integration).

    But there's only so much improvement you can get there. You next have to start looking at the algorithms. Anywhere you find yourself creating nested loops, or creating multiple sequential loops to deal with the same data set, you have to ask if there's a better way to do it. Profiling is a first step toward improving code already written, but even before the first step is planning and composing efficient code.

    I found a few areas where you could eliminate sequential loops, but I would need to know what goes on in the regexp-comparison loop to see if there's room for further efficiency improvements. And without profiling it's very difficult to know where to focus attention.

    This is untested since I don't know what to fill into the blanks. But it has eliminated a few either sequential or nested loops. See comments for indication of where loops have been refined. It's hard to know the impact it will have without knowing size of data sets, quantity of files handled, and so on, what is happening inside the regexp-matching loop, etc. But it could be a start in the right direction.

    By the way, the perl -T switch is for Taint Mode, which is described in brief in perlrun.


    Dave

      Oh, well this was an amazing help. Should I include my full code? It seemed like a lot to post, so I omitted it.

      grep { -f && /^parsed.*\.txt$/ caused an error, so I used  grep { '-f' && /^parsed.*\.txt$/} with single quotes, until this is tested, I just won't check if it's a file.

      Your post definitely gave me hope that I can find a way to optimize the rest of the script. I was really impressed by the use of $/ to eliminate so much. I hope I can begin to think that way. It may help to amalgamate all the files into one before hand to avoid that one loop

      Thanks!

        What error did it cause? I just tested the following code:

        my @found = grep { -f && /\.pl/ } @array;

        ... and it did work.

        You are probably not invoking the script from within the working directory. If you have proper permissions, you can 'chdir', or you could specify the full path like this:

        my @files = grep { -f && /^parsed.*\.txt$/ } map { "$dir/$_" } readdir($dh);

        A subtle change. From an efficiency standpoint, *slightly* worse, as map now acts on every item returned by readdir. But on the other hand, it works if your path is not your current directory. Besides, even if your directory has a thousand files in it, the map and grep aren't costing you much time. Again, this is where profiling comes in. :)

        Look at the CPAN module Devel::Profile.


        Dave

Re: Using FastCGI
by moritz (Cardinal) on Jun 14, 2011 at 19:45 UTC
    If you want to speed up a script, don't make changes based on hearsay. Use a profiler (like Devel::NYTProf) to find the things that are slow, and try to improve those.

    I guess in your case it's reading the files that is slow, but don't rely on my guessing - do the analysis yourself.

    For searching things, there are various fast, pre-made solutions. For example KinoSearch lets you build a so-called "inverted index" from documents in which searching for words is much faster than reading all the documents when you want to search them.

      Thanks a lot, I had never heard of this 'profiling', should definitely get me started in the right direction. I was thinking of eliminating the loop to read files, by just joining all the files together, so we'll see.
Re: Using FastCGI
by sundialsvc4 (Abbot) on Jun 14, 2011 at 20:05 UTC

    I agree.   If you are faced with a quantifiable problem, then that problem must have a quantifiable solution, and you should strictly limit yourself to that solution ... nothing more.   “Right now, you are barking up the wrong tree.”

    If the fundamental statement of your problem is that “you have a slow search engine,” then it probably does not matter in the slightest how that “too-slow search engine” is presently being invoked.   Therefore, changes to “how it is being invoked (and nothing more...)” probably won’t help the problem in the slightest.

    What you need to focus your (full...) attentions on is this:   “why is my search-engine slow?”

    Sure... when you come to that... “Plack is great.”   But, “(Plack | CGI | FastCGI | whatever) is merely a subroutine-call.”   If your fundamental algorithm can’t produce the expected results in a reasonable amount of time, then it really does not matter how it gets called.

      Sorry, I was a little lost as to what these speed improvements did, guess I was being a too optimistic that it would fix the problem. I will get a more efficient algorithm and then probably come back for further help. Thanks a lot.
Re: Using FastCGI
by Anonymous Monk on Jun 14, 2011 at 22:08 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://909628]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2024-04-24 08:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found