![]() |
|
Syntactic Confectionery Delight | |
PerlMonks |
Re: Catching Cheaters and Saving Memoryby BrowserUk (Patriarch) |
on Oct 13, 2006 at 09:49 UTC ( [id://578096]=note: print w/replies, xml ) | Need Help?? |
Given those numbers, it's time to be a little smart about what you need to know. Since your looking for cheaters, your first task is to decide at what level a mutually beneficial voting pattern can be attributed to cheating rather than coincidence. For example, if two users have voted once each, in the only thread they have both posted to, your probably not going to divine that as a pattern of malicious intent. With million users, a billion threads and 5 billion posts, each user will have an average of 5 posts. If you set a threshold for the minimum number of votes a user has to have cast before you will start vetting them for cheating--say 20 votes?--then make your first pass of the data accumlating votes against users. This will require a hash of 1 million numerical scalar values, around 50 MB. You then delete any entries in the hash that have less than your minimum_votes_cast threshhold. Set at 20, this is likely to discard 80 or 90% of the users. You can the make your second pass accumulating a hash of pairs showing for whom each of the qualifying voters cast their votes. As suggested elsewhere, if you use the pairs of userids as the keys in a single hash and skipping any voters that do not still exist in your first pass hash, then this will likely takes less space than the original hash before the discard step. It might look something like this (untested):
On my hardware, the two passes would take around 20 hours. The memory consumed will depend upon the level at which you set MIN_VOTES_THRESHOLD, but should be under 300 MB if you set it at a reasonable level. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
In Section
Seekers of Perl Wisdom
|
|