http://www.perlmonks.org?node_id=1230028


in reply to Re: Useful heuristics for analyzing arrays of data to determine column header
in thread Useful heuristics for analyzing arrays of data to determine column header

Making some progress on the module. So here's some sample data for a column with the raw count and cardinality value for each unique value in the column:

$VAR1 = { 'ACTIVE' => { 'count' => 1941, 'value_card' => '0.631630328669053' }, 'INACTIVE' => { 'value_card' => '0.233322486169867', 'count' => 717 }, 'RETIRED' => { 'count' => 414, 'value_card' => '0.134721770257078' }, 'STATUS' => { 'count' => 1, 'value_card' => '0.000325414904002603' } };

So in this simple case, the 'STATUS' value is unique to this column and is clearly an outlier from the other three possible values. But in fuzzier situations, how would I determine whether 'STATUS' is "1 standard deviation" away from the other value cardinality values?

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks