Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

$str to %hash to @ary

by abaxaba (Hermit)
on Jul 22, 2004 at 21:02 UTC ( #376716=perlquestion: print w/replies, xml ) Need Help??
abaxaba has asked for the wisdom of the Perl Monks concerning the following question:

So, I've got a colon-delimited string, that looks like this:
my $str="17:43:33:21:23:19:27:6";
which is really a stringified hash, that looks like this:
%hash=(17=>43, 33=>21, 23=>19, 27=>6);
This hash represents proportion-weights for advertisement roatation. That is, 17% of the time, ad #43 shows. 33% is ad #21, and so on. The way I am codifying this is to have an array, 0..99, where

So, the whole thing looks something like this:

my $str="17:43:33:21:23:19:27:6"; my %hash=split/:/,$str; foreach my $k(keys %hash) { push (@ary, map{$hash{$k}}(1..$k)); } my $adId=$ary[int(rand(100))];

I've ran some timed tests on this, and am not pleased with the results. Testing has determined that the part where the array is populated (via the push/map while foreaching over keys %hash) is where much of my load is coming in.

Any thoughts on how to streamline that (or any other part of the code) to make run faster?

"It is a very mixed blessing to be brought back from the dead." -- Kurt Vonnegut

Replies are listed 'Best First'.
Re: $str to %hash to @ary
by ccn (Vicar) on Jul 22, 2004 at 21:17 UTC
    my $str="17:43:33:21:23:19:27:6"; my @aux = split /:/, $str; my @ary = map {($aux[2*$_+1]) x $aux[2*$_]} 0 .. @aux/2-1;

    You can try the following code also:

    my $str="17:43:33:21:23:19:27:6"; my %hash = split /:/, $str; my $adno; my $rand = rand 100; my $sum = 0; for (keys %hash) { # there is no need of sorted keys $adno = $hash{$_}; last if ($sum += $_) > $rand; } print $adno;

    UPDATE: sort keys eliminated in 'for'

Re: $str to %hash to @ary
by duff (Vicar) on Jul 23, 2004 at 03:30 UTC

    Here's another method that doesn't use a hash at all (since the hash appears to be a synthetic variable not related to the problem you're trying to solve).

    my $str="17:43:33:21:23:19:27:6"; my @a = split/:/,$str; @a % 2 && die; # not an even number of items while (my ($p,$ad) = splice @a,0,2) { push @ary, ($ad) x $p; } my $adId=$ary[int(rand(100))];

    Basically it just splits the string into an array, then grabs elements 2 at a time from the array until the array is exhausted. The code also makes use of the repetition operator in list context to get the ads repeated the proper number of times.

Re: $str to %hash to @ary
by pg (Canon) on Jul 22, 2004 at 21:32 UTC

    At first sight, I spotted a problem right the way. Your code will never work base on your description of the data. Try this (it is your code, with slightly modified testing data):

    use Data::Dumper; my $str="27:43:33:21:23:19:27:6"; my %hash=split/:/,$str; foreach my $k(keys %hash) { push (@ary, map{$hash{$k}}(1..$k)); } my $adId=$ary[int(rand(100))]; print Dumper(\%hash);

    Your code does not handle the situation where two ads has the same percentage of show time. Well, the fix is not difficult, so I just leave it to you, just switch key/value, use show number as key.

Re: $str to %hash to @ary
by Eimi Metamorphoumai (Deacon) on Jul 22, 2004 at 21:21 UTC
    I can't say about the performance per se (I'm actually quite surprised that would be noticably slow unless you're running it a lot of times, since you'll only create a total of 100 entries), but I can say that your design appears to have some serious problems. One of the main ones is that you're using the percentages as keys to a hash, so no two advertisers can have the same percentage (or the second one would disappear entirely). Once you create @ary, do you only use it once? If so, then you might look at Weighted random numbers generator. If you are going to use it several times, you still might do better to reverse the direction of your hash, if you even need the hash at all--is there a reason for it?
Re: $str to %hash to @ary
by Limbic~Region (Chancellor) on Jul 22, 2004 at 22:34 UTC
    • Repeatedly doing hash lookups is slower than doing it once
    • The order appears reversed since as is you can't have two ads with the same weight nor lookup the weight of an add.
    Assuming it is ok to reverse the order, I have presented the following unique solution that does not require an array:
    #!/usr/bin/perl use strict; use warnings; my $lookup; my %hash = split /:/ , '43:17:21:33:19:23:6:27'; while (my ($key, $val) = each %hash) { $lookup .= pack("C*", ($key) x $val); } print unpack("C", substr($lookup, rand 100, 1));

    Cheers - L~R

Re: $str to %hash to @ary
by QM (Parson) on Jul 22, 2004 at 21:27 UTC
    At the expense of making the lookup a little slower:
    my $str="17:43:33:21:23:19:27:6"; my %hash=split/:/,$str; my $count; my %ad_lookup; foreach my $k(keys %hash) { $count += $hash{$k}; $ad_lookup{$count} = $k; } my $rand = rand(100); my $adid; foreach ( sort {$a <=> $b} keys %ad_lookup ) { $adid = $ad_lookup{$_} unless defined( $adid); if ( $_ <= $rand ) { $adid = $ad_lookup{$_}; } else { last; } }
    Of course, if you want the lookup to be faster, use a binary search instead.

    Quantum Mechanics: The dreams stuff is made of

      Unfortunately, your code doesn't actually work; as written it lumps the last two outputs together, and I see it outputting "6" 60% of the time, and "19" never.

      I'll go post my test code elsewhere and you can see if I've somehow miscopied your code.

      -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
        You're correct. I didn't test the code well, and seem to have gotten several things backward or out of whack. Here's properly tested code (I hope):
        #!/your/perl/here use strict; use warnings; my $str="17:43:33:21:23:19:27:6"; my %hash=split/:/,$str; my $count; my %ad_lookup; foreach my $k(keys %hash) { $count += $k; $ad_lookup{$count} = $hash{$k}; } my @ad_lookup_sorted_keys = sort {$b <=> $a} keys %ad_lookup; my %seen; for my $rand ( 1..$count ) { # my $rand = rand($count); my $adid; foreach my $key ( @ad_lookup_sorted_keys ) { if ( $rand <= $key ) { $adid = $ad_lookup{$key}; } else { last; } } # print "$adid\n"; $seen{$adid}++; } foreach my $k ( sort {$a <=> $b} keys %seen ) { print "\$seen{$k}: $seen{$k}\n"; }
        Which outputs:
        $seen{6}: 27 $seen{19}: 23 $seen{21}: 33 $seen{43}: 17
        [There would be a slight shift going back to rand, because of the 0..99 and 1..100 difference. I believe you just need to change:
        $ad_lookup{$count} = $hash{$k};
        $ad_lookup{$count-1} = $hash{$k};

        I should note there are several ways to do this and get it right, and this is but one. Also, the key sort is only done once. A binary search on the keyspace would be faster, if there are many keys, but I'd estimate you would need a hundred keys or so to make it worth while. [Feel free to test that if you like]

        Quantum Mechanics: The dreams stuff is made of

Re: $str to %hash to @ary
by NetWallah (Abbot) on Jul 23, 2004 at 00:10 UTC
    If you want to keep your hash structure, here are 2 alternatives that do not lose data:

    The first one uses the ad# as the key (inverse of your example).
    The second one uses the percent as key, but stores the Ad#'s as Values in a HOA.

    #First way - Ad# as key: my $str='27:43:33:21:23:19:27:6'; my( $temp,%h); %h= map {if($temp){ my $y=$temp; $temp=undef; $_=>$y }else{ $temp=$_; undef()=>undef; } } split/:/,$str; delete $h{undef()}; print qq($_=>$h{$_}\n) for keys %h; #Second way - using a HOA my $str='27:43:33:21:23:19:27:6'; my( $temp,%h); for ( split/:/,$str){ if($temp){ push @{$h{$temp}},$_; $temp=undef; }else{ $temp=$_; } }; print qq($_=> @{$h{$_}}\n) for keys %h;

        Earth first! (We'll rob the other planets later)

Re: $str to %hash to @ary
by fizbin (Chaplain) on Jul 23, 2004 at 14:42 UTC
    This is a variant on duff's method that's a bit faster, and I have here performance numbers to demonstrate it:
    my @a = split/:/,$str; @a % 2 && die; # not an even number of items my $r = int(rand(100)); my ($adId, $p); while (($p,$adId) = splice @a,0,2) { if ($r < $p) {last;} $r -= $p; }
    Now, the performance numbers; it's pretty easy to test all the variants given so far with
    Rate origCode duff QM L~R ccn fizbin origCode 4029/s -- -34% -77% -82% -87% -89% duff 6137/s 52% -- -65% -73% -81% -83% QM 17360/s 331% 183% -- -24% -46% -53% L~R 22908/s 469% 273% 32% -- -29% -38% ccn 32106/s 697% 423% 85% 40% -- -12% fizbin 36690/s 811% 498% 111% 60% 14% --
    I have the benchmark code, but it's basically just a bunch of cuts and pastes of the code that's been posted on this thread, so I'll only post it if asked.

    Update: Fixed oops in code, updated performance figures.

    -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
      After finding that oops above in my code, I thought I better put correctness tests in and publish the code I was using so that other people could tell me I was nuts if necessary.

      So here again is the output of the benchmark code:

      ==== Correctness tests ==== Probabilities are: {6 => 27.000, 19 => 23.000, 21 => 33.000, 43 => 17. +000} origCode yielded {6 => 26.955, 19 => 22.825, 21 => 33.142, 43 => 17.07 +8} duff yielded {6 => 27.111, 19 => 22.956, 21 => 33.008, 43 => 16.925} fizbin yielded {6 => 27.117, 19 => 23.057, 21 => 32.862, 43 => 16.964} L~R yielded {6 => 26.929, 19 => 22.946, 21 => 33.151, 43 => 16.974} QM yielded {6 => 59.902, 19 => 0.000, 21 => 17.062, 43 => 23.036} QM failed ccn yielded {6 => 26.991, 19 => 22.882, 21 => 33.064, 43 => 17.063} ==== Speed tests ==== Rate origCode duff QM L~R ccn fizbin origCode 4101/s -- -34% -78% -83% -87% -89% duff 6191/s 51% -- -66% -75% -81% -83% QM 18348/s 347% 196% -- -26% -43% -50% L~R 24743/s 503% 300% 35% -- -24% -32% ccn 32411/s 690% 424% 77% 31% -- -11% fizbin 36567/s 792% 491% 99% 48% 13% --
      And here's the code I used to make that determination. The formatting is from being run through perltidy, since it was even uglier before.
        You have missed that I reversed the ordering of the string. As is, there is no way to have two adds with the same percentage nor is there a straight forward way of determining the percentage of any given add. I was trying to be creative, provide a significant speed increase, and maintain manageability.

        Additionally, your code is unfair. In your benchmark, all solutions are being timed for creating a structure to return an add and then returning that add each time. The trouble is that, in my method (and others), we only have to invest that time once because we do not destroy it but your solution does. You would need to alter the bench to be fair.

        Cheers - L~R

        nice test
        you may try also this one:

        ccn_fast => sub { my $rand = rand 100; my $mystr = $str; # this line is needed for repetitive test on +ly 1 while $mystr =~ /([^:]+):([^:]+):?/g and (($rand -= $1) > 0) +; $2; },

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://376716]
Approved by ccn
Front-paged by broquaint
and the monastery is silent...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2016-12-03 14:09 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (56 votes). Check out past polls.