Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Top and bottom 10 percent elements of an array

by sesemin (Beadle)
on Apr 29, 2010 at 04:35 UTC ( #837442=perlquestion: print w/ replies, xml ) Need Help??
sesemin has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I need to sort an array ascending or descending does not matter. then choose the top and bottom 10 percent of data, replace them with A (for tops) and replace them by B (for lows). The remaining replace by "-". The put the data back to their original order.Something like the following. The 10 percent is hypothetical can be any percentage.

@array = (2 ,4, 3, 8, 9, 12, 13, 20, 18, 7 ) @sortedarray = (20, 18, 13, 12, 9, 8, 7, 4, 3, 2) After replacement (A, A, -, -, -, -, -, -, -, -, B, B) @finalarray = (B, -, B, -, -, -, -, A, A , -)

I know how to sort by index like the following code but I am just wondering if you can help me to learn how to the replacement. May be map function is the way to go.

#! perl -slw use strict; my @array = (2 ,4, 3, 8, 9, 12, 13, 20, 18, 7 ); my @orderedIndeces = sort{ $array[ $b ] <=> $array[ $a ] } 0 .. $#array; my $n = scalar @array; my $twentyperc = $n * 0.2; for my $i (0..$#orderedIndeces){ if ( $i < $twentyperc) { $array[$orderedIndeces[$i]] = "A"; print "$array[$orderedIndeces[$i]]\t"; } elsif ($i >= $n-$twentyperc){ $array[$orderedIndeces[$i]] = "B"; print "$array[$orderedIndeces[$i]]\t"; } else{ $array[$orderedIndeces[$i]] = "-"; print "$array[$orderedIndeces[$i]]\t"; } print "\n"; }

Comment on Top and bottom 10 percent elements of an array
Select or Download Code
Re: Top and bottom 10 percent elements of an array
by nagalenoj (Friar) on Apr 29, 2010 at 05:23 UTC

    This works better for the given elements. It gives the result as you need. Refer splice to know about it.

    my @array = (2 ,4, 3, 8, 9, 12, 13, 20, 18, 7 ); my $percent = (scalar @array * 0.20); my (@resultA, @resultB); my @ordered = sort {$a <=> $b} @array; @resultB = splice(@ordered, 0, $percent); @resultA = splice(@ordered, (scalar @ordered - $percent)); print "@array", "\n"; for (my $i=0; $i < scalar @array; $i++) { if ( grep { $array[$i] eq $_ } @resultA ) { $array[$i] = 'A'; } elsif ( grep { $array[$i] eq $_ } @resultB ) { $array[$i] = 'B'; } else { $array[$i] = '-'; } } print "@array", "\n";
      Thanks, easy to implement and easy to understand.
Re: Top and bottom 10 percent elements of an array
by samarzone (Pilgrim) on Apr 29, 2010 at 05:48 UTC

    Note that your logic may fail if there are duplicate entries in the array. You may get "A"s, "B"s or "-"s more/less than required percentage depending on your implementation.

      Good point, Thanks

      I will try Ikegami or codeacrobat solutions (below). seems more reliable. I was thinking the index sorting is the way to go too but not smart enough to finish the task.

Re: Top and bottom 10 percent elements of an array
by ikegami (Pope) on Apr 29, 2010 at 05:53 UTC

    It's simpler if you sort the indexes instead of the values.

    my $portion = 0.20; my @array = (2, 4, 3, 8, 9, 12, 13, 20, 18, 7); my $keep = int(@array * $portion); my @sorted_idxs = sort { $array[$a] <=> $array[$b] } 0..$#array; my @final = ('-') x @array; $final[$sorted_idxs[$_]] = 'B' for 0..$keep-1; $final[$sorted_idxs[$_]] = 'A' for -$keep..-1; print("@final\n");
    B - B - - - - A A -

    Update: Fixed off-by-one error.

Re: Top and bottom 10 percent elements of an array
by codeacrobat (Chaplain) on Apr 29, 2010 at 06:20 UTC
    How about a solution with map.
    @array = (2 ,4, 3, 8, 9, 12, 13, 20, 18, 7 ); my $c=0; @sort_a_pos = sort {$a->[1]<=>$b->[1]} map { [$c++ => $_] } @array; $pct = @array / 10; $_->[1] = "B" for @sort_a_pos[0 .. $pct]; $_->[1] = "-" for @sort_a_pos[$pct+1 .. $#array-$pct-1]; $_->[1] = "A" for @sort_a_pos[$#array-$pct .. $#array]; @finalarray = map {$_->[1] } sort { $a->[0] <=> $b->[0] } @sort_a_pos +; print "@finalarray";

    print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});
      $_->[1] = "B" for @sort_a_pos[0 .. $pct]; $_->[1] = "-" for @sort_a_pos[$pct+1 .. $#array-$pct-1]; $_->[1] = "A" for @sort_a_pos[$#array-$pct .. $#array];
      should be
      $_->[1] = "B" for @sort_a_pos[0 .. $pct-1]; $_->[1] = "-" for @sort_a_pos[$pct .. $#array-$pct-1]; $_->[1] = "A" for @sort_a_pos[$#array-$pct .. $#array];

      You probably got confused (like me) by the OP's weird math of 10 * 10% = 2. The output he gave was for 20%.

Re: Top and bottom 10 percent elements of an array
by JavaFan (Canon) on Apr 29, 2010 at 07:46 UTC
    The 10 percent is hypothetical can be any percentage.
    So, how should the array look like if the percentage is 75?
      or 100%
        or 99%...

        Good point. Also, the last two solutions do not work well if the number of array elements are odd. For instance, if we have 11 unique numbers it 5 should be A, 5 B and 1 dash. But both of them print 6 "A"s and 4 "B"s and no dash. I tried to play around with the ranges but could not resolve it. I think 50% is the maximum that we want to go to be able to divide the population of numbers into two categories. Therefore, I am not worry about 75% or higher unless there is another use for this little script.

Re: Top and bottom 10 percent elements of an array
by Limbic~Region (Chancellor) on Apr 29, 2010 at 18:08 UTC
    sesemin,
    Why do you need to do this? Is the data you presented representative of your actual data? I ask because sorting to find the top/bot N is typically simpler but less efficient than say using a heap. This sounds like it might be a fun and interesting problem but I really wouldn't want to spend time beyond the solution ikegami provided without knowing more.

    Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://837442]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (16)
As of 2014-07-30 16:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (236 votes), past polls