Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

I do appreciate you taking the time to look it over. I've had my own concerns as well, and value your input.

Let's talk about the random number generator for a moment because I have concerns too. Let me start by saying that I agree 100% that it's an unfortunate compromise, and I would be interested in hearing alternatives. As you pointed out it's unlikely that anyone is holding onto 3.2GB of integers in memory, but I needed a source of data. The original problem stated the integers were held in files. It would probably be worthwhile for me to alter the benchmark to pull in integers from a file instead. But then I'm testing IO, not the algorithm. Nevertheless, the random number generator is a poor alternative.

Before posting, I was already dissatisfied enough with it that I profiled the code to see what affect it was having on the overall algorithms. The calls to the random number generator were consuming about 1/3rd of the time spent in the Perl array-method comparison sub. In the hash method, it was a nearly equal number of seconds spent in the random number generator, so the ratio was lower. The ratio was about about 1/3rd of the time for the XS comparison too, and that alone pretty clearly demonstrated that when Perl called the random number generator it was more expensive than when C did. I considered posting profiling data as well, but really a better approach is needed altogether.

However, what we do have is consistency in the Perl functions. The array method is calling the same random generator, with the same overhead as the hash method. I'm going to create an example here using contrived numbers from my own recollection since I am not looking directly at my original post or the profiling data right now. Let's say that we spend 1/3rd of 12.5 seconds in generating random numbers (that's pretty accurate). That's a little over 4 seconds. In the XS version of the test we spent about 2/3rds of a second. Let's factor that out. That gives us 20 seconds per iteration for the hash method, 8.5 seconds for the array method, and 1.4 seconds for each iteration of the XS method. Add back in whatever constant time you want for disk IO: 60 seconds per test iteration? The hash approach takes 80 seconds now, the array approach takes 72.5, and the XS approach takes 61.4. None is fast, but considering the relative ease with which the quicker approaches can be applied isn't the improvement significant enough to investigate?

The data set was large enough that it would be a freak of improbability for a well-performing random generator to not at least return some values at the top most and bottom most positions in the range. I was using the tests to ensure that I had filled my allocated array from 1 to 999 without undershooting or overshooting it. If the tests showed I didn't hit the top and bottom, there were enough data pieces that it would be worth checking whether the generator was flawed. I wanted to also verify that when I hit the top of the array I didn't overshoot it. If I had, rather than the test failing I might have just crashed, but even if that happened, the tests served their purpose in exercising the code at its edges, and would have indicated to me where I was when such an event took place. Bad testing is none at all. This falls way short of good testing, but it was what I needed to check for some off-by-one possibilities. The key/value test was redundant though. And the tests were far from comprehensive. There was a narrow set of issues I wanted to double check.


In reply to Re^2: "Just use a hash": An overworked mantra? by davido
in thread "Just use a hash": An overworked mantra? by davido

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    [Lady_Aleena]: This evening, I thought to myself, "I would love to search Netflix for 100 randomfilms from my database to see if they have them." So I went to CPAN to see if there were a modiule which could do such a thing. There is, but GUH! ...

    How do I use this? | Other CB clients
    Other Users?
    Others contemplating the Monastery: (5)
    As of 2017-01-21 02:28 GMT
    Find Nodes?
      Voting Booth?
      Do you watch meteor showers?

      Results (180 votes). Check out past polls.