The Student Room Group

S2 sampling

In OCR S2, when do you divide standard deviation by sample size? Is it when you're trying to find population probabilities?
Sounds like the standard deviation of the mean of a sample.
(edited 7 years ago)
Original post by Christina Tiana
In OCR S2, when do you divide standard deviation by sample size? Is it when you're trying to find population probabilities?


If X~N(mu, sd^2) then Xbar ~ N(mu, sd^2/n), where Xbar is the mean of a sample of size n.

I'm not sure what comes up in OCR S2, but standard error is s (the sample variance) / square root n.
Reply 3
I do ocr mei so what happens is

You get X~N ( mu, s^2)
If x bar is needed to be found:

X bar ~ N (mu, s^2/n)

So when finding a prob: P (Z < (X bar - mu)/ square root of (s^2 / n) )

Hope that makes sense, correct me anyone if Im wrong please

Posted from TSR Mobile
Original post by Christina Tiana
In OCR S2, when do you divide standard deviation by sample size? Is it when you're trying to find population probabilities?


image.png

So from this question, why do I not divide by the sample size in part i and do in part ii? I'm quite confused of the wording here

It's June 2014 S2 OCR paper
Original post by Christina Tiana
image.png

So from this question, why do I not divide by the sample size in part i and do in part ii? I'm quite confused of the wording here

It's June 2014 S2 OCR paper


In part (i) you are asked a question about the distribution of the times that the candidates took. So to answer this, assuming that the times are normally distributed, you need to use the mean and the standard deviation of the distribution of those times.

On the other hand, in part (ii), you are asked a question about the distribution of the mean of the times taken rather than the distribution of the times taken. This is the reason underlying dividing by n here.

To understand a little more about the distribution of the mean, think about doing this experiment a large number of times - so repeated samples of 50 candidates each given the test and each time you calculate the mean time taken to complete. These mean times will have a distribution, and it is this that we are concerned with here.

It turns out that if the original population of times is normal (imagine every student there is and every time that they would take if they sat the test) with mean μ\mu and variance σ2\sigma^2, then the distribution of means of repeated samples of size nn will be normal with mean μ\mu and variance σ2n\frac{\sigma^2}{n}.
Original post by Gregorius
In part (i) you are asked a question about the distribution of the times that the candidates took. So to answer this, assuming that the times are normally distributed, you need to use the mean and the standard deviation of the distribution of those times.

On the other hand, in part (ii), you are asked a question about the distribution of the mean of the times taken rather than the distribution of the times taken. This is the reason underlying dividing by n here.

To understand a little more about the distribution of the mean, think about doing this experiment a large number of times - so repeated samples of 50 candidates each given the test and each time you calculate the mean time taken to complete. These mean times will have a distribution, and it is this that we are concerned with here.

It turns out that if the original population of times is normal (imagine every student there is and every time that they would take if they sat the test) with mean μ\mu and variance σ2\sigma^2, then the distribution of means of repeated samples of size nn will be normal with mean μ\mu and variance σ2n\frac{\sigma^2}{n}.


Thank you for the detailed response! I know it's simplified but to make my understanding clearer is it safe to say that the first question was just about the sample they took, and the second question is concerned with the population, so to calculate it standard deviation has to divide by the sample size?
Original post by Christina Tiana
Thank you for the detailed response! I know it's simplified but to make my understanding clearer is it safe to say that the first question was just about the sample they took, and the second question is concerned with the population, so to calculate it standard deviation has to divide by the sample size?


No, not quite. In both cases, you are using facts about the sample to estimate facts about the population. In using a sample of 50 candidates to estimate information about all possible candidates, you end up with your estimates having a degree of uncertainty.

So, in the first part, you use the distribution of times that the candidates in the sample took to estimate the distribution that all candidates would take. The simplifying assumption is made that all times are normally distributed (and thus the distributions are uniquely specified by two parameters, the mean and the variance). It turns out that the mean and variance of the sample are the "best" estimates of the mean and variance of the underlying population. Therefore these allow you to obtain the best estimates of the proportion of the population that would not finish in time. A more advanced question would go on to ask you to quantify how uncertain your estimates are based on this probabilistic set up!

In the second part, you are using estimates of the mean of the sample to infer things about the mean of the population; in order to do this, you need to know about the probability distribution of the mean of samples drawn from the underlying population. It is here (because you are now dealing with means) that the dividing variance by n comes from.

Quick Reply