Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^4: Comparing two arrays

by baxy77bax (Deacon)
on Dec 15, 2013 at 15:28 UTC ( [id://1067238]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Comparing two arrays
in thread Comparing two arrays

yea , probably i wasn't explicit about that. but if i'm looking for the top 10 matches, this implies that i would first need to do a comparison each x with each y to generate a some scoring scheme which i would then need to use as my key for sorting but that beats the purpose. I could to triangulation between x's and then "sort" them into buckets so that x's form different buckets with no 1's in common, then i could say if my x1<=>y has 30 1's in common i know that x2<=>y will have at most (total 1's) - 30 and then go from that. but in the end it usually turns out that this actually slows the whole thing down rather then speeds it up.

As far as bit-strings go. I just implemented the version of my program in perl and speed has amazingly increased, however my c++ version shows a slight decrease in runtime performance when compared to version where i used character arrays instead. I guess this is tied to optimization , maybe (or my bad implementation - which is most probably the true reason) but i guess this is more or less the max that i can get out of my machine (with respect to my programming abilities).

However, I cannot shake this feeling that due to fact that i need only top 10 matches, maybe i could somehow use this to implement a good heuristics. I just cannot wrap my mind around the problem.

anyway,

thank you

baxy

Replies are listed 'Best First'.
Re^5: Comparing two arrays
by BrowserUk (Patriarch) on Dec 15, 2013 at 15:40 UTC
    however my c++ version shows a slight decrease in runtime performance when compared to version where i used character arrays instead. I guess this is tied to optimization , maybe (or my bad implementation -

    You could post the (relevant bit of) the C++ code. Maybe we could see something to help...


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^5: Comparing two arrays
by educated_foo (Vicar) on Dec 18, 2013 at 00:57 UTC
    I just re-read your original post, and was reminded of this:
    there are about 500 times less 1's then 0's
    So for the average item, only 30 features are true, which changes things. You still can't do an all-to-all pairwise comparison in any reasonable amount of time, but you may be able to e.g. recursively partition your data by the feature closest to a 50/50 distribution, then find closest matches once you cut the comparison set down to something reasonable. If something simple like that won't work, you should look into dimensionality reduction, e.g. via random projection or PCA.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1067238]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2024-04-18 07:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found