Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Bidirectional lookup algorithm? (Updated: further info.)

by roboticus (Chancellor)
on Jan 05, 2015 at 19:11 UTC ( [id://1112219]=note: print w/replies, xml ) Need Help??


in reply to Bidirectional lookup algorithm? (Updated: further info.)

BrowserUk:

I have some couple questions:

  1. Is there an even mix of lookup by number and by name?
  2. Is there any bias to the names/values looked up? In other words, are certain values more likely to be looked up than others? If so, is there a way of knowing which ones are more likely before your lookup pass?
  3. Similarly: are recently-looked-up values more likely to be looked up again soon, or is it a totally random distribution?
  4. As I understand your question, the data is loaded first, then used frequently afterwards. Is my understanding correct?

There are likely other questions I could/should ask, if only I could think of them...

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: Bidirectional lookup algorithm? (Updated: further info.)
by BrowserUk (Patriarch) on Jan 05, 2015 at 20:02 UTC

    1. Is there an even mix of lookup by number and by name?

      The algorithm is a GA exploration, thus effectively random and approximately evenly distributed across the full ranges of both domains. At least in the early rounds.

      As the selection progresses towards a minima, the visited domains shrink, but the lookups in both domains remain roughly equal.

      In addition, after each iteration, many new pairs are selected (from the lookup tables), with an equivalent number old one discarded (from the actively considered subset, not the lookup tables), thus potentially opening up the full ranges again.

    2. Is there any bias to the names/values looked up? In other words, are certain values more likely to be looked up than others? If so, is there a way of knowing which ones are more likely before your lookup pass?

      As you'll gather from the above, no on all counts.

    3. Similarly: are recently-looked-up values more likely to be looked up again soon, or is it a totally random distribution?

      Yes, for brief periods of the total run, but then a different set will be active, then another. And there is no way to predict which subsets will be required at any given time.

      There is no mileage in only having a subset available at any given time.

    4. As I understand your question, the data is loaded first, then used frequently afterwards. Is my understanding correct?

      Actually generated rather than "loaded"; but yes.

      The dataset is generated as a random subset of a truly huge domain of possibilities.

      The larger the subset that can be generated -- limited by the size of the lookup table(s) and physical memory -- the more statistically valid the results.

      I'm currently limited to runs of 35e6 -- ~6GB of ram -- which represents ~0.000000003% of the total possibilities.

    I'm seeking a way to reduce the memory footprint without too much loss of performance, in order that I might increase the statistical validity of the simulation.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      BrowserUk:

      Hmmm ... I've been thinking about it off and on and haven't come up with anything interesting.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1112219]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (7)
As of 2024-03-19 02:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found