Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Re: Re: Re: Refining a 'vector space search'.

by Seumas (Curate)
on Jun 28, 2003 at 18:56 UTC ( #269917=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Re: Refining a 'vector space search'.
in thread Refining a 'vector space search'.

Is it enough to represent the values in 0/non-zero like that? That is, do we just need to know that "places 4, 11, 145, 519, 1238 are all non-zero"? I mean, in the vector I displayed as the example above, you see that they are broken down into what appears to be cos themselves(?) and everything is either 0 or a point between 0 and 1. So when performing a calculation on your above example, would "places 1,5 and 8 are positive" be just as fruitful as "place 1 is 0.02453, place 5 is 0.42128 and place 8 is 0.242112"?


Comment on Re: Re: Re: Re: Refining a 'vector space search'.
Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Refining a 'vector space search'.
by gjb (Vicar) on Jun 28, 2003 at 19:45 UTC

    This depends on the algorithm. Some information retrieval algorithms just work with boolean values, others keep track of the frequency of a term in a document.

    If you want to keep track of the frequencies, you can either store position/frequency pairs or use two lists, one for the position, the other for the frequencies. The former approach is cleaner, the latter should be faster.

    Hope this helps, -gjb-

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://269917]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (13)
As of 2015-07-28 23:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (260 votes), past polls