Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Search a database for every permutation of a list of words

by ariels (Curate)
on Jun 18, 2002 at 07:16 UTC ( #175297=note: print w/ replies, xml ) Need Help??


in reply to Search a database for every permutation of a list of words

IF you just want to rank documents in terms of how many query terms (@term = qw{hairline receding} in your example) each document contains, you don't need to do any permuting.

Note that there are various trade-offs of memory, disk space, and search time that you can make. You'll need to analyze carefully what a "typical" query is like, and what a "bad" query is like, and take appropriate measures.

Here's A Way To Do It. I assume you have (or will manufacture) a list of all search terms.

  • Create for each search term a list of the document IDs in which it appears.
  • (Untested, of course...)
    my %hit; foreach my $t (@term) { my @doc = $documents_mentioning{$t}; $hit{$_}++ for (@doc); } # Filter (keys %hits) by # of hits (at least 2), then sort. my @res = sort { $hit{$b} <=> $hit{$a} } (grep {$hit{$_} >= 2} keys %hit);

"Interesting" parts include dealing with very large indexes (if your collection of documents is large), stemming words, and selecting the relevant words.

A good alternative might be to get a 3rd-party search engine instead of writing your own.


Comment on Re: Search a database for every permutation of a list of words
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://175297]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (11)
As of 2014-11-21 11:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (107 votes), past polls