Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^3: Comparing two arrays

by educated_foo (Vicar)
on Dec 15, 2013 at 14:25 UTC ( #1067235=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Comparing two arrays
in thread Comparing two arrays

Sort by what criteria? "closest" by what criteria?
That obviously depends upon what the OP is trying to do. Maybe he wants lexicographic sorting, or maybe some other mapping; it's not specified in the question, so I didn't guess.


Comment on Re^3: Comparing two arrays
Re^4: Comparing two arrays
by baxy77bax (Chaplain) on Dec 15, 2013 at 15:28 UTC
    yea , probably i wasn't explicit about that. but if i'm looking for the top 10 matches, this implies that i would first need to do a comparison each x with each y to generate a some scoring scheme which i would then need to use as my key for sorting but that beats the purpose. I could to triangulation between x's and then "sort" them into buckets so that x's form different buckets with no 1's in common, then i could say if my x1<=>y has 30 1's in common i know that x2<=>y will have at most (total 1's) - 30 and then go from that. but in the end it usually turns out that this actually slows the whole thing down rather then speeds it up.

    As far as bit-strings go. I just implemented the version of my program in perl and speed has amazingly increased, however my c++ version shows a slight decrease in runtime performance when compared to version where i used character arrays instead. I guess this is tied to optimization , maybe (or my bad implementation - which is most probably the true reason) but i guess this is more or less the max that i can get out of my machine (with respect to my programming abilities).

    However, I cannot shake this feeling that due to fact that i need only top 10 matches, maybe i could somehow use this to implement a good heuristics. I just cannot wrap my mind around the problem.

    anyway,

    thank you

    baxy

      however my c++ version shows a slight decrease in runtime performance when compared to version where i used character arrays instead. I guess this is tied to optimization , maybe (or my bad implementation -

      You could post the (relevant bit of) the C++ code. Maybe we could see something to help...


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      I just re-read your original post, and was reminded of this:
      there are about 500 times less 1's then 0's
      So for the average item, only 30 features are true, which changes things. You still can't do an all-to-all pairwise comparison in any reasonable amount of time, but you may be able to e.g. recursively partition your data by the feature closest to a 50/50 distribution, then find closest matches once you cut the comparison set down to something reasonable. If something simple like that won't work, you should look into dimensionality reduction, e.g. via random projection or PCA.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1067235]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2014-07-14 10:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (257 votes), past polls