Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Empirically solving complex problems

by chas (Priest)
on Mar 05, 2005 at 22:20 UTC ( [id://436965]=note: print w/replies, xml ) Need Help??


in reply to Empirically solving complex problems

I'm a mathematician (teach statistics among other things), but I really don't understand what you mean by: "union of two normal distributions." (what does "union" mean in this context?) Distributions are probability measures and the sum of probability measures is not a probability. If you mean that you want to find the distribution of the sum of 2 normal random variables, then that can't be done unless you know the joint distribution; if, for example they are independent (so the joint distribution is a product), then the sum is again normal with mean equal to the sum of the means of the rvs and variance equal to the sum of the variances. From the code you exhibited, it looks like that's what you meant, but then that's the answer, and I don't see what else you want to do.
(Update: Actually, after looking at what you did some more, I think maybe you didn't want the distribution of the sum...I guess I don't understand...sorry.)
If you can clarify, I'll try to give a better answer.
chas
  • Comment on Re: Empirically solving complex problems

Replies are listed 'Best First'.
Re^2: Empirically solving complex problems
by oakbox (Chaplain) on Mar 06, 2005 at 17:53 UTC
    Let me be specific :)
    I have built a psychometric testing system. Let's say I have a test that measures only one trait, Extroversion, for example. (tests usually measure several traits at the same time, but I'll try to keep this to the essentials). Now, let's say that I measure a big group of people; men, women, lawyers, sales people, technologists, various education levels, etc. And I am, over time, able to come up with some 'norms' for the various groups. The mean and standard deviation of 'Extroversion' for the group 'Lawyers' is different than the mean and standard deviation of 'Extroversion' for the group 'technologists'. (Again, this is massive oversimplification, most of psychometric test validation is in the searching for which groups differ significantly from each other)

    NOW, a group of people inside ABC Corp take this test and I'm able to derive a mean and standard deviation of 'Extroversion' scores for this new group. The goal of the excercise here is to find out which of MY norms most closely matches the ABC Corp group.

    This all leads to the purpose of my matching script. I have two groups, represented by a mean and standard deviation, and I have to find out HOW alike they are. Preferably, I want to end up with an easy to understand representation of that match, a percentage, for example.

    In the case of find out how a particular individual's scores match up, that is trivial, I actually use stanine representations of the scores. (the 1.75 sigma boundaries aren't arbitrary, they are the upper limit of a score of 8 and the lower limit of a score of 2 in the stanine scale)

      I'm certainly no expert on that kind of problem. You could search on the web for the subject "Gaussian Mixtures" and might find some relevant information. (In any case I understand what you are doing now; sorry about the confusion.)
      chas
      It sounds like what you are after is a standard statistical method called "t-test". You feed it two distributions, and it tells you how alike the two distributions are. In fact, it's built into excel (called simply "TTEST").
      ------------ :Wq Not an editor command: Wq
Re^2: Empirically solving complex problems
by fizbin (Chaplain) on Mar 07, 2005 at 06:07 UTC
    It looks as though what oakbox is after is the integral of the function:
    min(p1(x),p2(x)) dx
    where p1 and p2 are the two probability distributions. That is, he's trying to find the area under both curves. Now, this isn't actually all _that_ hard, though the answer will include some calls to erf.

    Let's see.... (ten minutes of scribbling on paper later, accompanied by some looking up of things Mathworld)

    Ok, well, it's ugly, but this _should_ get the same results as the given procedure:

    use Math::Libm qw(erf erfc M_SQRT2); sub compare_bell_curves { my ($self,$m1,$sd1,$m2,$sd2) = @_; if ($sd1 > $sd2) { ($m1,$sd1,$m2,$sd2)=($m2,$sd2,$m1,$sd1); } elsif ($sd1 == $sd2) { # stupid corner case my $dist = abs($m1-$m2)/$sd1; return erfc($dist/2/M_SQRT2); } $m2 -= $m1; $m1 = 0; # Some terms omitted since $m1 = 0 my $sd2s= $sd2*$sd2; my $sd1s= $sd1*$sd1; my $A = ($sd2s - $sd1s); my $B = 2*($m2*$sd1s); my $C = 2*(log($sd1)-log($sd2))*$sd1s*$sd2s - $m2*$m2*$sd1s; my $disc = $B*$B - 4*$A*$C; my $rdisc = sqrt($disc); my $lower = (-$B - $rdisc)/(2*$A); my $upper = (-$B + $rdisc)/(2*$A); my $p1 = 0.5 + erf(($lower-$m2)/$sd2/M_SQRT2)/2; my $p2 = (erf($upper/$sd1/M_SQRT2)-erf($lower/$sd1/M_SQRT2))/2; my $p3 = erfc(($upper-$m2)/$sd2/M_SQRT2)/2; $p1+$p2+$p3; }
    Note that it took much, much longer to write this note and get the code working than to do the math. (Mostly, that was tracing down transcription errors in going from paper to code) The math itself was a matter of finding the intersections (which boils down to just solving a quadratic equation in x, albeit with messy coefficients), and then using the fact that the cumulative distribution function for a normal distribution is as given in equation 9 of http://mathworld.wolfram.com/NormalDistribution.html.

    True, there are many problems which cannot be solved or even vaguely approached analytically, but this isn't one of them.

    -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
      True, there are many problems which cannot be solved or even vaguely approached analytically, but this isn't one of them.

      There are more problems that cannot be solved by or even vaguely approached analytically by a given person than ones that cannot be solved by or even vaguely approached analytically. I think that the original poster laid out a pretty good case that he couldn't tackle this analytically, while leaving it open as to whether someone else might succeed in doing so.

      Therefore your success with an analytical approach takes nothing away from the value of the non-analytical approach in this case.

        While I see your point in general, I think that in this specific case the original poster gave up on the analytical approach too quickly, and came up with a solution that I don't think does what he thinks it does. (As I said, this particular problem didn't involve difficult, advanced mathematics -- it involved being able to solve a quadratic equation and plug the results into a black box pulled off the mathworld site)

        Among other things, the original solution is not symmetric with respect to the distributions: if I reverse distribution 1 and distribution 2, I should get the same answer, right? If not, I can hardly claim to be measuring something related to the union of two distributions. Also, I wonder whether measuring the area under two curves is in fact the appropriate thing to do given the problem.

        -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://436965]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-18 20:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found