Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re^5: Comparing two arrays

by educated_foo (Vicar)
on Dec 18, 2013 at 00:57 UTC ( #1067565=note: print w/replies, xml ) Need Help??

in reply to Re^4: Comparing two arrays
in thread Comparing two arrays

I just re-read your original post, and was reminded of this:
there are about 500 times less 1's then 0's
So for the average item, only 30 features are true, which changes things. You still can't do an all-to-all pairwise comparison in any reasonable amount of time, but you may be able to e.g. recursively partition your data by the feature closest to a 50/50 distribution, then find closest matches once you cut the comparison set down to something reasonable. If something simple like that won't work, you should look into dimensionality reduction, e.g. via random projection or PCA.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1067565]
[ambrus]: huh... what does "SV" mean there?
[Marshall]: stevieb wish you well with water problems, my water was off all day Monday for repairs. Any more progress on .exe info problem?.
[stevieb]: Marshall: no. I commented on the thread yesterday. Visual Studio updates the tags ok, but corrupts the exe
[stevieb]: I advised the OP that a possible workaround would be to add a version function/flag that displays the required copyright/license info instead
[Marshall]: Darn! I'll look at the thread. Must not be updatinga byte count somewhere. The .exe format is a complex critter.

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2016-12-08 18:12 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (144 votes). Check out past polls.