Re: Empirically solving complex problems

Replies are listed 'Best First'.
Re^2: Empirically solving complex problems by oakbox (Chaplain) on Mar 06, 2005 at 17:53 UTC
Let me be specific :) I have built a psychometric testing system. Let's say I have a test that measures only one trait, Extroversion, for example. (tests usually measure several traits at the same time, but I'll try to keep this to the essentials). Now, let's say that I measure a big group of people; men, women, lawyers, sales people, technologists, various education levels, etc. And I am, over time, able to come up with some 'norms' for the various groups. The mean and standard deviation of 'Extroversion' for the group 'Lawyers' is different than the mean and standard deviation of 'Extroversion' for the group 'technologists'. (Again, this is massive oversimplification, most of psychometric test validation is in the searching for which groups differ significantly from each other) NOW, a group of people inside ABC Corp take this test and I'm able to derive a mean and standard deviation of 'Extroversion' scores for this new group. The goal of the excercise here is to find out which of MY norms most closely matches the ABC Corp group. This all leads to the purpose of my matching script. I have two groups, represented by a mean and standard deviation, and I have to find out HOW alike they are. Preferably, I want to end up with an easy to understand representation of that match, a percentage, for example. In the case of find out how a particular individual's scores match up, that is trivial, I actually use stanine representations of the scores. (the 1.75 sigma boundaries aren't arbitrary, they are the upper limit of a score of 8 and the lower limit of a score of 2 in the stanine scale) oakbox	[reply]
Re^3: Empirically solving complex problems by chas (Priest) on Mar 06, 2005 at 21:20 UTC
I'm certainly no expert on that kind of problem. You could search on the web for the subject "Gaussian Mixtures" and might find some relevant information. (In any case I understand what you are doing now; sorry about the confusion.) chas	[reply]
Re^3: Empirically solving complex problems by etcshadow (Priest) on Mar 08, 2005 at 19:28 UTC
It sounds like what you are after is a standard statistical method called "t-test". You feed it two distributions, and it tells you how alike the two distributions are. In fact, it's built into excel (called simply "TTEST"). `------------ :Wq Not an editor command: Wq` [download]	[reply] [d/l]
Re^2: Empirically solving complex problems by fizbin (Chaplain) on Mar 07, 2005 at 06:07 UTC
It looks as though what oakbox is after is the integral of the function: min(p1(x),p2(x)) dx where p1 and p2 are the two probability distributions. That is, he's trying to find the area under both curves. Now, this isn't actually all _that_ hard, though the answer will include some calls to `erf`. Let's see.... (ten minutes of scribbling on paper later, accompanied by some looking up of things Mathworld) Ok, well, it's ugly, but this _should_ get the same results as the given procedure: use Math::Libm qw(erf erfc M_SQRT2); sub compare_bell_curves { my ($self,$m1,$sd1,$m2,$sd2) = @_; if ($sd1 > $sd2) { ($m1,$sd1,$m2,$sd2)=($m2,$sd2,$m1,$sd1); } elsif ($sd1 == $sd2) { # stupid corner case my $dist = abs($m1-$m2)/$sd1; return erfc($dist/2/M_SQRT2); } $m2 -= $m1; $m1 = 0; # Some terms omitted since $m1 = 0 my $sd2s= $sd2$sd2; my $sd1s= $sd1$sd1; my $A = ($sd2s - $sd1s); my $B = 2($m2$sd1s); my $C = 2(log($sd1)-log($sd2))$sd1s$sd2s - $m2$m2$sd1s; my $disc = $B$B - 4$A$C; my $rdisc = sqrt($disc); my $lower = (-$B - $rdisc)/(2$A); my $upper = (-$B + $rdisc)/(2$A); my $p1 = 0.5 + erf(($lower-$m2)/$sd2/M_SQRT2)/2; my $p2 = (erf($upper/$sd1/M_SQRT2)-erf($lower/$sd1/M_SQRT2))/2; my $p3 = erfc(($upper-$m2)/$sd2/M_SQRT2)/2; $p1+$p2+$p3; } [download] Note that it took much, much longer to write this note and get the code working than to do the math. (Mostly, that was tracing down transcription errors in going from paper to code) The math itself was a matter of finding the intersections (which boils down to just solving a quadratic equation in x, albeit with messy coefficients), and then using the fact that the cumulative distribution function for a normal distribution is as given in equation 9 of http://mathworld.wolfram.com/NormalDistribution.html. True, there are many problems which cannot be solved or even vaguely approached analytically, but this isn't one of them. `-- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/` [download]	[reply] [d/l] [select]
Re^3: Empirically solving complex problems by tilly (Archbishop) on Mar 07, 2005 at 07:32 UTC
True, there are many problems which cannot be solved or even vaguely approached analytically, but this isn't one of them. There are more problems that cannot be solved by or even vaguely approached analytically by a given person than ones that cannot be solved by or even vaguely approached analytically. I think that the original poster laid out a pretty good case that he couldn't tackle this analytically, while leaving it open as to whether someone else might succeed in doing so. Therefore your success with an analytical approach takes nothing away from the value of the non-analytical approach in this case.	[reply]
Re^4: Empirically solving complex problems by fizbin (Chaplain) on Mar 07, 2005 at 14:18 UTC
While I see your point in general, I think that in this specific case the original poster gave up on the analytical approach too quickly, and came up with a solution that I don't think does what he thinks it does. (As I said, this particular problem didn't involve difficult, advanced mathematics -- it involved being able to solve a quadratic equation and plug the results into a black box pulled off the mathworld site) Among other things, the original solution is not symmetric with respect to the distributions: if I reverse distribution 1 and distribution 2, I should get the same answer, right? If not, I can hardly claim to be measuring something related to the union of two distributions. Also, I wonder whether measuring the area under two curves is in fact the appropriate thing to do given the problem. `-- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/` [download]	[reply] [d/l]


Your skill will accomplish what the force of many cannot
	PerlMonks