http://www.perlmonks.org?node_id=701877


in reply to Test Survey Results

First of all well done!

I like the topic and I know from experience how laborious gathering statistics can be.

I would also like to use the opportunity to elaborate a little on the sample size (the 167 participants) as well.

Sample size

When undertaking any sample survey (a collection of information from only part of a population), you may experience what is known in statistics as a “sampling error”.

The sampling error can be defined as the difference between the estimate derived from a sample survey and the “true” value that would result if a 100% sample survey (the whole population) were taken under the same conditions.

The sample of 167 people seems (very) small to me. How many programmers are out there? What sample size is necessary to be able to reliably say something about the population?

I welcome your idea to expand the test and put it on O’Reilly.

Replies are listed 'Best First'.
Re^2: Test Survey Results
by amarquis (Curate) on Aug 04, 2008 at 12:57 UTC

    The spread from sample size is likely dwarfed by selection bias in this case.*

    "For example, fully 66% of respondents claimed that their primary code base has a test suite but it's been my experience that the number is far smaller." is where the selection bias shows. I agree that the real number is much smaller, and what we are seeing is that people who like to talk about test suites are likely to have test suites.

    * - The simple way to estimate your standard deviation from a sample is the bootstrap method: you assume the population is like your sample and just use the deviation of the sample. The deviation of a rate (means, percentages, etc.) goes as the deviation of one draw divided by root the number of draws. So, in this case, about 3.8%. Even if you had n=1000, your spread would be ~1.5% or so. As a poll taker, your greatest challenge is much more often how your select your sample than it is the sample size.

Re^2: Test Survey Results
by Herkum (Parson) on Aug 04, 2008 at 15:08 UTC

    You should also point out the sample distribution as well. I have known about 20-30 perl programmers that I have worked with and none of them have ever done testing. Nor have any of them ever visited perlmonks.org or use.perl.org or any other organized mailing list. CPAN is considered some type of spam site(maybe I am exaggerating a little).

    The point is, that the people who would see and take the Poll as people who are more active in the community therefore more likely to have better habits than someone who 'already knows' how to do programing and therefore does not consult the web for advice. This distribution will royally screw up any potential estimate that you are trying to get from your Poll.