Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Refining a 'vector space search'.

by choocroot (Friar)
on Jun 29, 2003 at 02:42 UTC ( [id://269955]=note: print w/replies, xml ) Need Help??


in reply to Refining a 'vector space search'.

If you cannot use fixed length vectors because the full set of document is too big or unknown (because it's a stream of documents) and can evolve, then you could work with hashes. You store for each documents the terms with their frequencies. In fact it's like keeping only the terms with non-zero frequencies. In a database, that could be represented with a "huge" table with document id, terms, and frequency columns:

docid | term | freq.

Then, for the search, you retrieve the document hash from the db, and expand it to a vector with all the terms (build the "full vector" with 'SELECT DISTINCT term FROM index_table' with everything set to zero, then place your document term/freq in it), so you can compute your cosine ...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://269955]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-23 06:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found