Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

It is, at best, very rude to say “total garbage” in response to a post ... it doesn’t make you look like a genius, merely a boor.   (I searched the entire thread for the word “best” and did not find it.   Perhaps your eyes are better than mine.)

That being said, i-f the requirement is for “a matching entry,” then the technique that I described will work extremely well.   Strictly speaking, yes, it is possible for (say) a hash-collision to occur, but with a strong algorithm like SHA1 it basically isn’t going to happen.

Any exploitable “predictable fact” about the actual data can be profitably used to reduce the search-space, and (IMHO) in a situation such as this, pragmatically must be.   Even something as basic as “the total number of 1’s” can be pressed into service.   If you can “reasonably predict” that the differences between the searched-for array and the “best fit” that will be found will consist, let’s say, of a change-of-state of no more than (some) n positions, then even a brute-force search could be limited to consider only the candidates which fall into that range, perhaps starting with any exact-matches and then working outward x for x in (1..n).   This does, of course, open up the possibility of a statistical Type-2 Error (concluding that no best-match exists when in fact one does), but this might be judged to be either acceptable or necessary.   (Or not ...)

If necessity really must become the mother of invention, and once again if you know that there are exploitable characteristics of the data, it is also possible to apply hashing or ones-counting to slices of the total vector.   Instead of merely counting all the 1’s, count them in every (say) thousand bits.   Apply some useful heuristic to this vector of sums to decide whether you choose to examine the whole thing.

In the end, the problem won’t be completely-abstract, nor will be its solution.   There must be something, in the real world, that stipulates what is and what is not “best,” or even “plausible.”   It’s my opinion that you must solve this problem, at least in significant part, by reducing the total number of vectors that you consider, and by selecting for consideration only those which are “most likely.”   Representational optimizations such as the use of bit-vectors may also be important, but even these can’t be applied in a brute-force way.   You’ll simply never get the work done.


In reply to Re: Comparing two arrays by sundialsvc4
in thread Comparing two arrays by baxy77bax

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (7)
    As of 2014-07-11 01:40 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      When choosing user names for websites, I prefer to use:








      Results (217 votes), past polls