Re: Refining a 'vector space search'.


Think about Loose Coupling
	PerlMonks

Re: Refining a 'vector space search'.

by choocroot (Friar)

on Jun 29, 2003 at 02:42 UTC ( [id://269955]=note: print w/replies, xml )

Need Help??

in reply to Refining a 'vector space search'.

If you cannot use fixed length vectors because the full set of document is too big or unknown (because it's a stream of documents) and can evolve, then you could work with hashes. You store for each documents the terms with their frequencies. In fact it's like keeping only the terms with non-zero frequencies. In a database, that could be represented with a "huge" table with document id, terms, and frequency columns:

docid | term | freq.

Then, for the search, you retrieve the document hash from the db, and expand it to a vector with all the terms (build the "full vector" with 'SELECT DISTINCT term FROM index_table' with everything set to zero, then place your document term/freq in it), so you can compute your cosine ...

Comment on Re: Refining a 'vector space search'.

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://269955]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others rifling through the Monastery: (4)

As of 2024-04-23 06:55 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found