Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^3: Finding Nearly Identical Sets (Updated:4200/sec)

by BrowserUk (Pope)
on Sep 29, 2016 at 14:08 UTC ( #1172930=note: print w/replies, xml ) Need Help??


in reply to Re^2: Finding Nearly Identical Sets (Updated:4200/sec)
in thread Finding Nearly Identical Sets

Unfortunately, a Boolean of yes/no regarding if it has been seen before without being able to retrieve the near matches isn't going to be practical.

This is just the first, very fast, filter. The same method that generates *all the near matches* for *all* the known sets, in this filter, can be used again in a second pass, on individual near matches, to find the number(s) they match to.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^3: Finding Nearly Identical Sets (Updated:4200/sec)

Replies are listed 'Best First'.
Re^4: Finding Nearly Identical Sets (Updated:4200/sec)
by Limbic~Region (Chancellor) on Oct 04, 2016 at 14:15 UTC
    BrowserUk,
    True. I am not sure if that will be viable in the overall application or not. As I mentioned to you previously, I'm not sure an in-memory solution will work because of parallel processing. It is definitely food for thought.

    Cheers - L~R

      I'm not sure an in-memory solution will work because of parallel processing.

      Hm. The primary reason -- there are others -- for using parallel processing is: speed.

      I pretty much guarantee that you will not be able to achieve 500/s using a disk-based file or DB let alone 5000/s; -- disk access is at least 100,000 times slower than memory -- which means you now need 10 processors instead on one just to get back to par.

      And if 5000/s isn't enough? Put the bitmaps in shared memory (NOT threads::shared) and run multiple threads...

      Anyway, good luck with the project which ever way you choose to go :)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1172930]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (12)
As of 2021-03-01 22:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favorite kind of desktop background is:











    Results (28 votes). Check out past polls.

    Notices?