Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

grouping numbers

by ag4ve (Monk)
on Jul 11, 2013 at 11:49 UTC ( #1043700=perlquestion: print w/replies, xml ) Need Help??
ag4ve has asked for the wisdom of the Perl Monks concerning the following question:

I'm cross posting from beginners mailing list

Basically I want to see most active events in logs.

Technically, I don't even want a static average - if a group is 5,6,7, I'd like to see it the same as 8,10,12,14,16 but probably have the later rank higher based on $distance/$numbers. As it is, I can't even get this to work, so...

use strict; use warnings; use Data::Dumper; my $arr = [ 10, 7, 5, 10, 50, 70, 75, 72, 79, 80 ]; my $avg; foreach my $i ($#$arr) { $avg = ($i + ($avg ? $avg : $i)) / 2; } print "avg [$avg]\n"; @$arr = sort {$a <=> $b} @$arr; my $store; foreach my $i (0 .. $#$arr) { $store->[$i]{num} = $arr->[$i]; $store->[$i]{thresh} = $avg; foreach my $store_i (0 .. $#$store) { if (abs($arr->[$i] - $store->[$store_i]{num}) <= $store->[$i]{ +thresh}) { push @{$store->[$i]{group}}, $arr->[$i]; } } } print Dumper($store);

Replies are listed 'Best First'.
Re: grouping numbers
by QM (Parson) on Jul 11, 2013 at 14:31 UTC
    This is suspicious:
    foreach my $i ($#$arr) { $avg = ($i + ($avg ? $avg : $i)) / 2; }

    This only runs once, with $i taking the value of the last index of $arr. You probably meant @$arr.

    The next problem is that this isn't an average in the normal sense, it's a weighted average where the last element gets much more weight than the first, more like:

    $avg = (...(($arr->[0]/2 + $arr->[1])/2 + $arr->[2])/2 + ... $arr->[$# +$arr])/2;

    So perhaps you meant a decaying average, but it's not clear from the context. Here's something equivalent, with output:

    my @x = (10, 7, 5, 10, 50, 70, 75, 72, 79, 80); sub avg { my @arr = @_; my $avg; foreach my $i (@arr) { print "$i: $avg\n"; $avg = ($i + ($avg ? $avg : $i))/2; } return $avg; } print avg(@x),"\n"; 10: # undef 7: 10 5: 8.5 10: 6.75 50: 8.375 70: 29.1875 75: 49.59375 72: 62.296875 79: 67.1484375 80: 73.07421875 76.537109375 print avg(reverse @x),"\n"; 80: # undef 79: 80 72: 79.5 75: 75.75 70: 75.375 50: 72.6875 10: 61.34375 5: 35.671875 7: 20.3359375 10: 13.66796875 11.833984375

    Making a change to produce the uniformly weighted average:

    my @x = (10, 7, 5, 10, 50, 70, 75, 72, 79, 80); sub avg2 { my @arr = @_; my $sum; foreach my $i (@arr) { print "$i: $sum\n"; $sum += $i; } return $sum/@arr; } print avg(@x),"\n"; 10: # undef 7: 10 5: 17 10: 22 50: 32 70: 82 75: 152 72: 227 79: 299 80: 378 45.8 print avg(reverse @x),"\n"; 80: # undef 79: 80 72: 159 75: 231 70: 306 50: 376 10: 426 5: 436 7: 441 10: 448 45.8

    where the forward and reverse lists produce the same average.

    A lot of the syntax would be less cumbersome if you started with my @arr = (...).

    Quantum Mechanics: The dreams stuff is made of

Re: grouping numbers
by ww (Archbishop) on Jul 11, 2013 at 12:44 UTC

    Serious Error: "can't even get this to work" is not recognized as a valid error description.

    Seriously, you'll need to tell us more about what you expected or want; why the output deviates from your heart's desire; and -- tho none appear when I test or execute your code -- error messages or warnings received from code you haven't shown.

    If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.

      Seriously? The code runs fine for me.

      The expected output of the code or what I'd like. Since I'm not sure if I'm on the right track with the code, what I'd like are three sets with 5,7,10,10, 50, and 70,72,75,79,80.

      I'm not sure how else to explain this - between the code I'm stuck on, the use case, and abstract....?

        What is significant about those groups of numbers?

        Perl can't mind read.


        Not sure if this is any use but is does give the expected output. It loops through the sorted list working out the effect of adding each to 'current group average'. If change is greater than the value 'diff' it starts a new group

        #!perl use strict; my @data = (10,7,5,10,50,70,75,72,79,80); my @arr = sort {$a<=>$b} @data; my $diff = 2; # group closeness my @avg; #[count,sum] my @grp; my $g=0; for my $i (0..$#arr){ my $val = $arr[$i]; #print "value $val\n"; if ($i == 0){ push @{$grp[$g]},$val; $avg[$g] = [1,$val]; } else { #work out new average with this element my ($n,$sum) = @{$avg[$g]}; #print "count $n sum $sum\n"; my $avg = $sum/$n; my $new_avg = ($sum+$val)/($n+1); if (abs($new_avg - $avg) < $diff){ # join group push @{$grp[$g]},$val; $avg[$g] = [$n+1,$sum+$val]; } else { # start new group ++$g; push @{$grp[$g]},$val; $avg[$g] = [1,$val]; } } } for (@grp) { print join ',',@$_,"\n"; }
Re: grouping numbers
by 5mi11er (Deacon) on Jul 11, 2013 at 12:18 UTC
    I think you need to explain this better, I don't understand what you're trying to accomplish.


      I want to see if someone sends a ton of packets that I DROP in a short time vs someone that, over time, ends up sending the same number (or more) DROPped packets (or DENY, etc). Technically, I'm looking at a time stamp and converting it to epoch, but the general theory holds up to what I presented.

Re: grouping numbers
by mtmcc (Hermit) on Jul 11, 2013 at 12:27 UTC

    Agreed... More info please!


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1043700]
Approved by marto
[Eily]: you could tie a variable into not having the same value each time, if you like to make people who try to debug your code facepalm
[Corion]: perl -wle 'package o; use overload q("") => sub {warn "str"; ""}, bool => sub{warn "bool"; 1}; package main; my $o={}; bless $o => o; print "Yay" if ($o && !length($o))'
[Corion]: But people writing such code should document the objects they construct and why it makes sense for an object to be invisible as string while being true in a boolean context
[hippo]: That's equal parts clever and horrendous.
[Eily]: the overload version wouldn't return true with "$x" && !length $x though, I guess
[hippo]: The more I look at this code, the more $x is a plain old scalar and the more this condition will never be true. I'm calling it a bug at this point.
[hippo]: Thanks for your input which has soothed my sanity (a little)
[Corion]: Eily: Sure - if you force both things into stringy things, then you break that magic. But that would also mean that you changed the expression, as now $x = 0.00 will be true instead of false as it were before
[Corion]: Ah no, at least in my feeble experiments that doesn't change the meaning
[Corion]: We sell sanity in small packages ;)

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2017-07-27 13:42 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (413 votes). Check out past polls.