Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Re: Re: Re: Refining a 'vector space search'.

by Seumas (Curate)
on Jun 28, 2003 at 18:56 UTC ( #269917=note: print w/replies, xml ) Need Help??

in reply to Re: Re: Re: Refining a 'vector space search'.
in thread Refining a 'vector space search'.

Is it enough to represent the values in 0/non-zero like that? That is, do we just need to know that "places 4, 11, 145, 519, 1238 are all non-zero"? I mean, in the vector I displayed as the example above, you see that they are broken down into what appears to be cos themselves(?) and everything is either 0 or a point between 0 and 1. So when performing a calculation on your above example, would "places 1,5 and 8 are positive" be just as fruitful as "place 1 is 0.02453, place 5 is 0.42128 and place 8 is 0.242112"?
  • Comment on Re: Re: Re: Re: Refining a 'vector space search'.

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Refining a 'vector space search'.
by gjb (Vicar) on Jun 28, 2003 at 19:45 UTC

    This depends on the algorithm. Some information retrieval algorithms just work with boolean values, others keep track of the frequency of a term in a document.

    If you want to keep track of the frequencies, you can either store position/frequency pairs or use two lists, one for the position, the other for the frequencies. The former approach is cleaner, the latter should be faster.

    Hope this helps, -gjb-

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://269917]
[ambrus]: choroba: heh heh... I have such a doc bug report somewhere. fixed by now.
[Corion]: Once upon a time I had automatic tests for checking the synopsis, but I stopped doing that because the setup was too fragile on CPAN testers for extracting code from the SYNOPSIS.
[Corion]: Maybe I should move the extraction of the code from the SYNOPSIS section into the author tests, or something like that...
[choroba]: Corion Sounds reasonable
[Corion]: choroba: Yeah - I basically have the same for regenerating README and README.mkdown already, except that I do that in Makefile.PL, but I guess one or the other thing should somehow work ;)
[ambrus]: was this bug: https://rt.cpan. org/Public/Bug/ Display.html?id= 59814

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2017-02-27 12:12 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (383 votes). Check out past polls.