Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
go ahead... be a heretic
 
PerlMonks  

Top and bottom 10 percent elements of an array

by sesemin (Beadle)
on Apr 29, 2010 at 04:35 UTC ( #837442=perlquestion: print w/ replies, xml ) Need Help??
sesemin has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I need to sort an array ascending or descending does not matter. then choose the top and bottom 10 percent of data, replace them with A (for tops) and replace them by B (for lows). The remaining replace by "-". The put the data back to their original order.Something like the following. The 10 percent is hypothetical can be any percentage.

@array = (2 ,4, 3, 8, 9, 12, 13, 20, 18, 7 ) @sortedarray = (20, 18, 13, 12, 9, 8, 7, 4, 3, 2) After replacement (A, A, -, -, -, -, -, -, -, -, B, B) @finalarray = (B, -, B, -, -, -, -, A, A , -)

I know how to sort by index like the following code but I am just wondering if you can help me to learn how to the replacement. May be map function is the way to go.

#! perl -slw use strict; my @array = (2 ,4, 3, 8, 9, 12, 13, 20, 18, 7 ); my @orderedIndeces = sort{ $array[ $b ] <=> $array[ $a ] } 0 .. $#array; my $n = scalar @array; my $twentyperc = $n * 0.2; for my $i (0..$#orderedIndeces){ if ( $i < $twentyperc) { $array[$orderedIndeces[$i]] = "A"; print "$array[$orderedIndeces[$i]]\t"; } elsif ($i >= $n-$twentyperc){ $array[$orderedIndeces[$i]] = "B"; print "$array[$orderedIndeces[$i]]\t"; } else{ $array[$orderedIndeces[$i]] = "-"; print "$array[$orderedIndeces[$i]]\t"; } print "\n"; }

Comment on Top and bottom 10 percent elements of an array
Select or Download Code
Re: Top and bottom 10 percent elements of an array
by nagalenoj (Friar) on Apr 29, 2010 at 05:23 UTC

    This works better for the given elements. It gives the result as you need. Refer splice to know about it.

    my @array = (2 ,4, 3, 8, 9, 12, 13, 20, 18, 7 ); my $percent = (scalar @array * 0.20); my (@resultA, @resultB); my @ordered = sort {$a <=> $b} @array; @resultB = splice(@ordered, 0, $percent); @resultA = splice(@ordered, (scalar @ordered - $percent)); print "@array", "\n"; for (my $i=0; $i < scalar @array; $i++) { if ( grep { $array[$i] eq $_ } @resultA ) { $array[$i] = 'A'; } elsif ( grep { $array[$i] eq $_ } @resultB ) { $array[$i] = 'B'; } else { $array[$i] = '-'; } } print "@array", "\n";
      Thanks, easy to implement and easy to understand.
Re: Top and bottom 10 percent elements of an array
by samarzone (Pilgrim) on Apr 29, 2010 at 05:48 UTC

    Note that your logic may fail if there are duplicate entries in the array. You may get "A"s, "B"s or "-"s more/less than required percentage depending on your implementation.

      Good point, Thanks

      I will try Ikegami or codeacrobat solutions (below). seems more reliable. I was thinking the index sorting is the way to go too but not smart enough to finish the task.

Re: Top and bottom 10 percent elements of an array
by ikegami (Pope) on Apr 29, 2010 at 05:53 UTC

    It's simpler if you sort the indexes instead of the values.

    my $portion = 0.20; my @array = (2, 4, 3, 8, 9, 12, 13, 20, 18, 7); my $keep = int(@array * $portion); my @sorted_idxs = sort { $array[$a] <=> $array[$b] } 0..$#array; my @final = ('-') x @array; $final[$sorted_idxs[$_]] = 'B' for 0..$keep-1; $final[$sorted_idxs[$_]] = 'A' for -$keep..-1; print("@final\n");
    B - B - - - - A A -

    Update: Fixed off-by-one error.

Re: Top and bottom 10 percent elements of an array
by codeacrobat (Chaplain) on Apr 29, 2010 at 06:20 UTC
    How about a solution with map.
    @array = (2 ,4, 3, 8, 9, 12, 13, 20, 18, 7 ); my $c=0; @sort_a_pos = sort {$a->[1]<=>$b->[1]} map { [$c++ => $_] } @array; $pct = @array / 10; $_->[1] = "B" for @sort_a_pos[0 .. $pct]; $_->[1] = "-" for @sort_a_pos[$pct+1 .. $#array-$pct-1]; $_->[1] = "A" for @sort_a_pos[$#array-$pct .. $#array]; @finalarray = map {$_->[1] } sort { $a->[0] <=> $b->[0] } @sort_a_pos +; print "@finalarray";

    print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});
      $_->[1] = "B" for @sort_a_pos[0 .. $pct]; $_->[1] = "-" for @sort_a_pos[$pct+1 .. $#array-$pct-1]; $_->[1] = "A" for @sort_a_pos[$#array-$pct .. $#array];
      should be
      $_->[1] = "B" for @sort_a_pos[0 .. $pct-1]; $_->[1] = "-" for @sort_a_pos[$pct .. $#array-$pct-1]; $_->[1] = "A" for @sort_a_pos[$#array-$pct .. $#array];

      You probably got confused (like me) by the OP's weird math of 10 * 10% = 2. The output he gave was for 20%.

Re: Top and bottom 10 percent elements of an array
by JavaFan (Canon) on Apr 29, 2010 at 07:46 UTC
    The 10 percent is hypothetical can be any percentage.
    So, how should the array look like if the percentage is 75?
      or 100%
        or 99%...

        Good point. Also, the last two solutions do not work well if the number of array elements are odd. For instance, if we have 11 unique numbers it 5 should be A, 5 B and 1 dash. But both of them print 6 "A"s and 4 "B"s and no dash. I tried to play around with the ranges but could not resolve it. I think 50% is the maximum that we want to go to be able to divide the population of numbers into two categories. Therefore, I am not worry about 75% or higher unless there is another use for this little script.

Re: Top and bottom 10 percent elements of an array
by Limbic~Region (Chancellor) on Apr 29, 2010 at 18:08 UTC
    sesemin,
    Why do you need to do this? Is the data you presented representative of your actual data? I ask because sorting to find the top/bot N is typically simpler but less efficient than say using a heap. This sounds like it might be a fun and interesting problem but I really wouldn't want to spend time beyond the solution ikegami provided without knowing more.

    Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://837442]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (13)
As of 2014-04-17 11:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (446 votes), past polls