Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re: Comparing two arrays

by educated_foo (Vicar)
on Dec 15, 2013 at 12:57 UTC ( #1067230=note: print w/ replies, xml ) Need Help??

in reply to Comparing two arrays

In addition to what others have said -- use packed bit-arrays -- you can sort the X arrays first, then do a binary search to find the closest X for each Y (~16 comparisons apiece).

Comment on Re: Comparing two arrays
Re^2: Comparing two arrays
by BrowserUk (Pope) on Dec 15, 2013 at 13:54 UTC
    you can sort the X arrays first, then do a binary search to find the closest X for each Y

    Sort by what criteria? "closest" by what criteria?

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Sort by what criteria? "closest" by what criteria?
      That obviously depends upon what the OP is trying to do. Maybe he wants lexicographic sorting, or maybe some other mapping; it's not specified in the question, so I didn't guess.
        yea , probably i wasn't explicit about that. but if i'm looking for the top 10 matches, this implies that i would first need to do a comparison each x with each y to generate a some scoring scheme which i would then need to use as my key for sorting but that beats the purpose. I could to triangulation between x's and then "sort" them into buckets so that x's form different buckets with no 1's in common, then i could say if my x1<=>y has 30 1's in common i know that x2<=>y will have at most (total 1's) - 30 and then go from that. but in the end it usually turns out that this actually slows the whole thing down rather then speeds it up.

        As far as bit-strings go. I just implemented the version of my program in perl and speed has amazingly increased, however my c++ version shows a slight decrease in runtime performance when compared to version where i used character arrays instead. I guess this is tied to optimization , maybe (or my bad implementation - which is most probably the true reason) but i guess this is more or less the max that i can get out of my machine (with respect to my programming abilities).

        However, I cannot shake this feeling that due to fact that i need only top 10 matches, maybe i could somehow use this to implement a good heuristics. I just cannot wrap my mind around the problem.


        thank you


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1067230]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2014-08-21 08:13 GMT
Find Nodes?
    Voting Booth?

    The best computer themed movie is:

    Results (128 votes), past polls