Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^6: Curious find while comparing grep, map, and smart match...

by dbuckhal (Chaplain)
on Mar 27, 2013 at 17:19 UTC ( [id://1025760]=note: print w/replies, xml ) Need Help??


in reply to Re^5: Curious find while comparing grep, map, and smart match...
in thread Curious find while comparing grep, map, and smart match...

Yes, you added an efficient random number generator, but how do you think those "shuffled" results relate to what the other results represent? What do you think I am benchmarking? How do you interpret the results you posted?

You are pinpointing on a specific facet of my code and not looking at my code as a whole.

Replies are listed 'Best First'.
Re^7: Curious find while comparing grep, map, and smart match...
by BrowserUk (Patriarch) on Mar 27, 2013 at 17:45 UTC
    those "shuffled" results relate to what the other results represent?

    They achieve the same results -- an array of 100 random integers in the range 1 ,, 120 -- 20 times more efficiently than your best attempt and nearly 100 times more efficiently than your worst; whilst saving 100MB of memory and the setup costs.

    And the random selection of the values in that array is statistically fair with my method and not with yours.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      If my goal was to simply generate an array of unique random numbers, then you are correct and I completely understand, and appreciate, your solution. The memory cost saving ability of shuffle is definitely invaluable. But, that was not my goal. My goal was to benchmark the filtering/processing methods of grep, map, and~~ against the same data set.

      Can you see where I did not think your results directly related to the other results? If not, then that's fine.

        But, that was not my goal. My goal was to benchmark the filtering/processing methods of grep, map, and~~ against the same data set.

        That implies that you have an application for filtering values against an array that is currently too slow; so you chose too benchmark alternatives. That's good.

        But rather than benchmarking the actual application, you made up this 'unique random number selection' problem and used that as the basis of your benchmark. That's less good.

        The chances are that if you posted a benchmark for the actual application, then one of the monks would see an alternative approach to that application that would similarly avoid the need to do O(N) processing of a huge list.

        For example, for simple unique filtering of small lists of values, using a hash is way more efficient:

        sub hashGen { my $idx = 0; my %mArray; ++$mArray{ $nums[ ++$idx ] } while keys %mArray < $uSize; return keys %mArray; } __END__ C:\test>junk Rate grepGen mapGen firstGen smartGen hashGen sh +uffleEm grepGen 45.2/s -- -54% -79% -96% -99% + -100% mapGen 97.9/s 116% -- -54% -91% -98% + -100% firstGen 214/s 374% 119% -- -80% -97% + -99% smartGen 1074/s 2274% 997% 401% -- -83% + -96% hashGen 6500/s 14275% 6540% 2932% 506% -- + -75% shuffleEm 25619/s 56551% 26068% 11849% 2286% 294% + --

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^7: Curious find while comparing grep, map, and smart match...
by space_monk (Chaplain) on Mar 27, 2013 at 17:46 UTC

    The output of your program is meant to be a set of unique random numbers. How it is achieved is surely irrelevant (well almost). The code produced by BrowserUk delivers the same output as the previous 3 functions in a much shorter time.

    I hope this is regarded as informative and educational. TBH, I didn't think of looking at the problem in the same way as BrowserUk did until he posted his idea. It often happens, especially round here, that people look at a problem completely differently from the way you do and achieve a much better result. It's happened a few times to me as well. I believe it's called an XY problem, when people come on here talking about how to do Y when they should really be talking about X, the step that got them to Y; and its the person who first spots that who often produces the most interesting answer.

    If you are arguing that what he achieved was "against the rules" in some way, and you expected each function to have to sort through duplicates, then knock yourself out....:-)

    A Monk aims to give answers to those who have none, and to learn from those who know more.
      How it is achieved is surely irrelevant (well almost).

      Actually, it is absolutely relevant, because that is the core of my benchmark: comparing how fast grep, map, and ~~ process data. It is the generation of the data set that is irrelevant and probably can be replaced with BrowserUK's method, because it probably does not matter if what each of those functions process is unique, or not.

      note: this posting spree has really helped me improve on my posting markups!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1025760]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-03-28 12:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found