in reply to Re: Re: Re: Refining a 'vector space search'. in thread Refining a 'vector space search'.
Is it enough to represent the values in 0/nonzero like that? That is, do we just need to know that "places 4, 11, 145, 519, 1238 are all nonzero"? I mean, in the vector I displayed as the example above, you see that they are broken down into what appears to be cos themselves(?) and everything is either 0 or a point between 0 and 1. So when performing a calculation on your above example, would "places 1,5 and 8 are positive" be just as fruitful as "place 1 is 0.02453, place 5 is 0.42128 and place 8 is 0.242112"?
Re: Re: Re: Re: Re: Refining a 'vector space search'.
by gjb (Vicar) on Jun 28, 2003 at 19:45 UTC

This depends on the algorithm. Some information retrieval algorithms just work with boolean values, others keep track of the frequency of a term in a document.
If you want to keep track of the frequencies, you can either store position/frequency pairs or use two lists, one for the position, the other for the frequencies. The former approach is cleaner, the latter should be faster.
Hope this helps, gjb
 [reply] 
