1. Hi Everyone,

I think this is a question about repeated experiments.

On Day 1, 25% of 1,000 teenagers polled said they brushed their teeth that morning. I can calculate the confidence interval from that information.

On Day 2, I ask the same 1,000 teenagers and this time the percentage is 25% again.

Across both days, I have a percentage of 25% and a sample of 1,000. But somehow I feel more confident about the average across both days than I do about each day individually.

My basic confidence interval calculation has inputs for percentage and sample, but there is no way to account for the repeated experiment. Is there a multiplying factor of some kind? Is there an alternative formula to use?

Jim
There is not enough information to answer the question. In order to come up with an answer, we would have to know something about the probability structure of the tooth-cleaning process in teenagers.

I'll try and illustrate with two diametrically opposed possibilities:

(i) Each individual teenager is either a teeth cleaner or is a non teeth cleaner. If they clean their teeth on one morning they will clean their teeth on any other morning; if they don't they will not. We draw a sample of 1,000 teenagers to estimate the proportion of cleaners versus non-cleaners.

(ii) Each individual teenager either cleans their teeth or does not randomly with a probability of p, independent of any other teenager in the sample and independent of whether they cleaned their teeth on any other day.

In case (i) the second day experiment gains no information additional to the experiment done on day one. Confidence intervals are unchanged.

In case (ii) the second day experiment does gain you more information; in fact the repeat sample is acting as a completely new sample because of the posited independence. Confidence intervals will be shorter by a factor of .

Updated: April 14, 2016
