Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
more useful options
 
PerlMonks  

Curious find while comparing grep, map, and smart match...

by dbuckhal (Scribe)
on Mar 27, 2013 at 00:25 UTC ( #1025608=perlquestion: print w/ replies, xml ) Need Help??
dbuckhal has asked for the wisdom of the Perl Monks concerning the following question:

I was using the smart match operator to help generate an array of unique, random numbers. For legacy purposes, I used map in scalar context to do the same thing. Then I saw grep being used in a recent thread in the same manner as my map usage, so I also incorporated that and benchmarked their performances.

Smart match blew the other two away, but what I did find interesting was the resulting difference in the unique arrays generated.

First, to level the playing field for testing, I started with generating a 1_000_000 element array of random numbers to be used as the "pool" that my generator methods would pick against. So, each generator utilized the same pool, rather than another set of random numbers. In the end, it was the generated arrays that were interesting.

Here is a sample of the three unique arrays generated and sorted:

Benchmark: timing 2500 iterations of grepGen, mapGen, smartGen... grepGen: 34 wallclock secs (34.35 usr + 0.00 sys = 34.35 CPU) @ 72 +.78/s (n=2500) mapGen: 14 wallclock secs (14.09 usr + 0.00 sys = 14.09 CPU) @ 17 +7.47/s (n=2500) smartGen: 1 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 26 +28.81/s (n=2500) Rate grepGen mapGen smartGen grepGen 72.8/s -- -59% -97% mapGen 177/s 144% -- -93% smartGen 2629/s 3512% 1381% -- loop counts - grep: 805000, map: 805000, smart: 495000 array sizes - num: 1000000; grep: 100; map: 100; smart: 100 grep: 1 12 15 18 19 20 21 22 23 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 + 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 + 64 65 66 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 + 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 104 105 106 107 108 +109 110 111 112 113 114 116 117 118 119 map: 1 12 15 18 19 20 21 22 23 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 + 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 + 64 65 66 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 + 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 104 105 106 107 108 +109 110 111 112 113 114 116 117 118 119 smart: 0 1 2 3 5 7 8 9 12 13 15 16 18 19 21 22 23 25 26 27 28 29 30 31 32 33 +34 35 36 37 38 39 42 43 44 45 46 47 48 51 52 53 54 55 56 57 58 59 60 +61 62 63 64 65 66 68 69 70 71 72 73 74 75 76 77 78 79 81 83 84 85 86 +87 89 90 91 92 93 94 95 96 97 98 100 101 102 104 105 106 107 108 109 +110 111 112 113 116 117 118 119

The grep and map arrays are the same, but smart match picked values that the other methods did not! Each method picked from the exact same pool.

This is the curious find that I mentioned in my subject. So, I hope that others can check my code and maybe find the results curious, also.

code

#!/usr/bin/perl use strict; use warnings; use Benchmark qw( timethese cmpthese ); my @doCount = (0); # total loops until unique array generate +d my $range = 120; # numbers range from 0 to $range my $aSize = 1000000; # random pool array size my $uSize = 100; # unique array size my ( @grep, @map, @smart ); my @nums = arrGen(); # generate huge pool of random numbers my $r = timethese( 2500, { grepGen => sub{ @grep = grepGen(); }, mapGen => sub{ @map = mapGen(); }, smartGen => sub{ @smart = smartGen(); }, } ); # show results cmpthese $r; print "loop counts - grep: $doCount[0], map: $doCount[1], smart: $doCo +unt[2]\n"; print "array sizes - num: ", scalar(@nums), # should all == $uSize "; grep: ", scalar(@grep), "; map: ", scalar(@map), "; smart: ", scalar(@smart), "\n"; # show sorted generated arrays to check for a rare duplicate print "grep:\n@{[sortIt(@grep)]}\nmap:\n@{[sortIt(@map)]}\nsmart:\n@{[ +sortIt(@smart)]}\n"; sub arrGen { my @nArray; for ( 1..$aSize ) { my $rInt = int(rand($range)); push @nArray, $rInt; } return @nArray; } sub sortIt { return sort { $a <=> $b } @_; } sub grepGen { my $idx = 0; my @gArray; for ( 1..$uSize ) { $doCount[0]++; my $rInt = $nums[$idx++]; redo if ( @gArray && grep { /$rInt/ } @gArray ); push @gArray, $rInt; } return @gArray; } sub mapGen { my $idx = 0; my @mArray; for ( 1..$uSize ) { $doCount[1]++; my $rInt = $nums[$idx++]; redo if ( @mArray && map { /$rInt/ } @mArray ); push @mArray, $rInt; } return @mArray; } sub smartGen { my $idx = 0; my @sArray; for ( 1..$uSize ) { $doCount[2]++; my $rInt = $nums[$idx++]; redo if ( @sArray && $rInt ~~ @sArray ); push @sArray, $rInt; } return @sArray; }

Comment on Curious find while comparing grep, map, and smart match...
Select or Download Code
Re: Curious find while comparing grep, map, and smart match... (substring)
by tye (Cardinal) on Mar 27, 2013 at 01:00 UTC

    3145 =~ /14/ but not 14 ~~ 3145.

    - tye        

      That makes sense, hence many single digit values are rarely present.

      Good candidate for a "golf"-ish answer!

      UPDATE: adding word boundaries helped, now the arrays generated are equal.

      map { /\b$rInt\b/ } @mArray

      Thanks!

Re: Curious find while comparing grep, map, and smart match...
by jwkrahn (Monsignor) on Mar 27, 2013 at 02:13 UTC
    redo if ( @gArray && grep { /$rInt/ } @gArray ); ... redo if ( @mArray && map { /$rInt/ } @mArray ); ... redo if ( @sArray && $rInt ~~ @sArray );

    The three are not equivalent.    grep and map use regular expressions and have code blocks while the other does not.    Try it like this:

    redo if @gArray && grep /$rInt/, @gArray; ... redo if @mArray && map /$rInt/, @mArray; ... redo if @sArray && /$rInt/ ~~ @sArray;

      Wow, that, and using word boundaries from tye's hint, made a significant difference!

      before:

      Benchmark: timing 2500 iterations of grepGen, mapGen, smartGen... grepGen: 23 wallclock secs (22.51 usr + 0.00 sys = 22.51 CPU) @ 11 +1.06/s (n=2500) mapGen: 11 wallclock secs (11.42 usr + 0.00 sys = 11.42 CPU) @ 21 +8.93/s (n=2500) smartGen: 1 wallclock secs ( 1.05 usr + 0.00 sys = 1.05 CPU) @ 23 +92.34/s (n=2500)

      after:

      Benchmark: timing 2500 iterations of grepGen, mapGen, smartGen... grepGen: 14 wallclock secs (13.70 usr + 0.00 sys = 13.70 CPU) @ 18 +2.52/s (n=2500) mapGen: 13 wallclock secs (13.57 usr + 0.00 sys = 13.57 CPU) @ 18 +4.20/s (n=2500) smartGen: 2 wallclock secs ( 1.19 usr + 0.00 sys = 1.19 CPU) @ 21 +07.93/s (n=2500)

      That really narrowed the gap between grep and map.

Re: Curious find while comparing grep, map, and smart match...
by BrowserUk (Pope) on Mar 27, 2013 at 02:31 UTC

    All of your methods of picking 100 random numbers between 1 & 120 are horribly complicated and inefficient:

    use List::Util qw[ shuffle ]; my @rands = ( shuffle 1 .. 120 )[ 0 .. 99 ];

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Explain.

        Explain what?

        The code I posted; or why your code is inefficient?

        I will make an effort to interpret BrowserUKs terse comment for you :-)

        In grepGen/mapGen/smartGen you have a redo in the event of a collision, whereas if you really want unique random numbers, they could be generated much more simply by the shuffle method he suggested. His method does not involve repeated retries and removes the need for any of these functions. His one liner does what each of the three methods you proposed does, but much more efficiently....

        my @rands = ( shuffle # shuffle, like a pack of cards 1 .. 120 # a series of numbers from 1..120 ) [ 0 .. 99 ]; # take the first 100 of these # the resulting @rands is a 100 element array of numbers # in the range 1..120, all of which are unique
        A Monk aims to give answers to those who have none, and to learn from those who know more.
      Just for fun, here a poor mans shuffle:

      DB<228> sort { (-1)**(rand(2)%2) } 1..10 => (5, 8, 7, 6, 9, 10, 1, 4, 2, 3)

      Surely neither efficient, nor well mixed and pretty pointless since List::Util is core. =)

      Cheers Rolf

      ( addicted to the Perl Programming Language)

      update

      for completeness:

      > nor well mixed

      The distribution is perfect iff the list has 2^n elements

      DB<483> for (1..100000) {$x=0;$g[$x++]{$_}++ for sort {(-1) ** int r +and 2} a..d } => "" DB<484> \@g => [ { a => 25302, b => 25028, c => 24852, d => 24818 }, { a => 25062, b => 24925, c => 24773, d => 25240 }, { a => 24773, b => 24944, c => 25203, d => 25080 }, { a => 24863, b => 25103, c => 25172, d => 24862 }, ]
        Surely neither efficient, nor well mixed and pretty pointless ...

        Agreed.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1025608]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2014-04-19 18:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (483 votes), past polls