Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Bidirectional lookup algorithm? (Updated: further info.)

by polettix (Vicar)
on Jan 05, 2015 at 22:37 UTC ( [id://1112243]=note: print w/replies, xml ) Need Help??


in reply to Bidirectional lookup algorithm? (Updated: further info.)

While I was reading through the question and some of the answers, I couldn't stop thinking that a redis data structure server might be something to investigate. brian_d_foy wrote something about it lately and might be a handy starter.

And yes, I might be totally off track!

perl -ple'$_=reverse' <<<ti.xittelop@oivalf

Io ho capito... ma tu che hai detto?
  • Comment on Re: Bidirectional lookup algorithm? (Updated: further info.)

Replies are listed 'Best First'.
Re^2: Bidirectional lookup algorithm? (no redis )
by Anonymous Monk on Jan 05, 2015 at 23:15 UTC
    See http://redis.io/topics/benchmarks (esp Pitfalls and misconceptions ), its too slow for this problem ( its ipc )
    $ redis-benchmark -t set -r 100000 -n 1000000 ====== SET ====== 1000000 requests completed in 13.86 seconds 50 parallel clients 3 bytes payload keep alive: 1

    A simple perl program adds 1mil key/value pairs in a hash in in 2.6875 seconds on a laptop from 2006

    OP doesn't need concurrency , so redis not needed

      Although in the OP there were no specific quality parameters about space/time constraints, my understanding is that space efficiency improvements could come at the expense of (some) speed, hence the suggestion.

      Any benchmark should probably consider both aspects.

      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Io ho capito... ma tu che hai detto?
        my understanding is that space efficiency improvements could come at the expense of (some) speed,

        You are correct in that. The ultimate desire is to retain (as much of as is possible) the speed of Perl's hashes; whilst reducing the space requirement necessitated by the need for bidi lookup.

        My initial brief suggested that a move from O(1) to O(logN) (with a very small constant) lookup time would be acceptable; if a 4:1 (75%) reduction in space/1:4 (400%) increase in capacity (or greater) was achieved.

        However, anonymonk is also correct, in that external storage -- whether disk/SSD or remote ram via sockets -- is too slow. A simple test will show that the minimum turnaround time of sending a request to a 'remote' (even where 'remote' equates to the callback address) and retrieving the reply requires 3 times as long as a perl hash loopkup. Even when you completely exclude the actual lookup and have the request and response consist of a single byte each way.

        No matter how fast the actual lookup at the remote, adding that constant to it crushes the overall goal.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re^2: Bidirectional lookup algorithm? (Updated: further info.)
by sundialsvc4 (Abbot) on Jan 05, 2015 at 23:06 UTC

    With due deference to Brian, I must be candid in expressing my extreme doubts as to whether “an entirely separate server” could possibly be apropos in the present case.   (I understand the situation that he is speaking to, and I don’t think that the present situation qualifies.)   To my admittedly simple-minded way of thinking, the present problem is a problem that quite-obviously could quite-readily be solved by the use of two in-memory indexes to an in-memory data structure ... a-n-d / o-r by two indexes on an SQL table.

    To arrive at an appropriate solution to this problem, we do not need to climb the mountain of exotica.   We can satisfy ourselves just as well with the perfectly mundane.   The only decision that we must now arrive-at is this:   “exactly what are the ‘ruling constraints’ to this problem?”   Then, “what is the simplest solution that will satisfy it?”

    The only constraint that has so-far presented itself is ... the availability of RAM.   However, the validity of this constraint as-presented rests entirely upon the supposition that “the entire data-structure must reside completely in (physical(!) ...) RAM.”   If this were the case, then the answer would consist of simple arithmetic.   However, I perceive that this is not the only practicable solution.   This problem can be solved in a satisfactory manner regardless of memory-size.   (We’ve been managing such things for fifty-plus years.)

    “In-memory” approaches are unquestionably fast ... i-f “as much memory as one might desire” is, in fact, available ... on production machines as well as developer boxes ... “without delay.”   In the real world, alas, this is not the case, and therefore we are obliged to resort more-or-less to files.   (And, once we do that, our physical-RAM working-set requirements are dramatically reduced ... thereby re-defining the problem entirely.)

    The [only (!) ... ] “in-memory” solution to this problem is actually trivial:   “you must define two indexes to the data-structure, in addition to the data-structure itself, and therefore you must possess enough RAM to simultaneously hold all three and to access all three of them without any fear of virtual-memory paging delays.   Either you do have such luxury ... o-r ... you don’t.   Nothing more need be debated nor discussed.

    If “an embarrassment of RAM-riches” cannot be 100% counted-upon, then one is pushed rather-decisively toward the use of a database ... and thus upon an approach that must regard every query to that structure as being “more or less costly.”   The “cost-equals-zero proposition” of a pure-RAM solution is eliminated ... as are all of ts theoretical advantages.   “Welcome to the Real World ...”

      “exactly what are the ‘ruling constraints’ to this problem?”

      The ruling constraints are clearly and concisely laid out in the OP; you're either too lazy to read them properly; or too dumb to understand them.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1112243]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (6)
As of 2024-04-18 21:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found