The Student Room Group

A level stats challenge question - help needed

Ill post a pic of the question below, I get that you need to use a normal distribution approximation but what i dont get is why the approximation doesn't use the sample mean thing where you divide the variance by n
Reply 1
the question
Original post by FM1/FP1
Ill post a pic of the question below, I get that you need to use a normal distribution approximation but what i dont get is why the approximation doesn't use the sample mean thing where you divide the variance by n

Not totally sure what youre confused about, but the usual binomial -> normal formulae for mean and variance are np and np(1-p). Those are the values you use to calculate the critical region(s).

If you assume that the samples are drawn from a normal distribution with known population variance, then the distribution associated with the sample mean estimate is normal with variance = population variance / n.
Original post by FM1/FP1
Ill post a pic of the question below, I get that you need to use a normal distribution approximation but what i dont get is why the approximation doesn't use the sample mean thing where you divide the variance by n

The "divide the variance by n" thing applies when you form a sample by randomly selecting n members of a normally distributed population and are interested in properties of the sample mean. However, what you're doing here is simply using the normal distribution N(np, np(1-p)) to model the binomial distribution B(n, p).
Reply 4
Original post by mqb2766
Not totally sure what youre confused about, but the usual binomial -> normal formulae for mean and variance are np and np(1-p). Those are the values you use to calculate the critical region(s).

If you assume that the samples are drawn from a normal distribution with known population variance, then the distribution associated with the sample mean estimate is normal with variance = population variance / n.


ok I get this, but wouldn't this mean the hypothesis test will be in terms of p i.e. H0=0.53,H1/=0.53 but solution bank writes the test in terms of the mean i.e. H0:mean=159,H1:mean/=159, and wouldnt this mean you are now conducting a hypothesis test on what the mean of the sample is so you would need to divide variance by n. Or does that not apply because we know that the sample is binomially distributed and not normally distributed?
Original post by FM1/FP1
ok I get this, but wouldn't this mean the hypothesis test will be in terms of p i.e. H0=0.53,H1/=0.53 but solution bank writes the test in terms of the mean i.e. H0:mean=159,H1:mean/=159, and wouldnt this mean you are now conducting a hypothesis test on what the mean of the sample is so you would need to divide variance by n. Or does that not apply because we know that the sample is binomially distributed and not normally distributed?


Can you upload the solution bank youre talking about? The mean of the normal approximation is np=159 etc ... The hypothesis test for both the binomial and the normal approximation is in terms of np ... as the random variable represents counting the number of "successes".

Its worth noting that both the binomial and the normal approximation is defned on a domain 0..n whereas if you wanted to think of the random variable/distribution as being an estimate of the probability p, then it would be defined on the domain 0..1. So doing a linear transformation of the random variable by dividing by n, you get the usual results for the new (normal) random variable
mean = p
variance = p(1-p)/n
where the variance now descreases as 1/n as you note happens with the usual mean estimator of a normal distribution. Plotting binomial distributions, you should note that the width descreases compared to the domain 0..n as n increases.

So intperpreting the binomial as a mean estiator by dividing the random variable by n is consistent with the formula for the more usual normal mean estimator. Though they are different distributions/random variables , hence the different formulae.

So if n=10 and p=0.5 a binomial sample might look like
0,1,1,0,1,0,0,0,1,0
and 10 samples from a normal with mu=0.5 and std dev = 0.1, might be
0.55,0.42,0.61,0.35,0.49,0.45,0.7,0.53,0.57,0.43
Their averages represent an esimate of p or mu (0.5) and the variance of the estimates decrease as 1/n.
(edited 1 year ago)
Reply 6
Original post by mqb2766
Can you upload the solution bank youre talking about? The mean of the normal approximation is np=159 etc ... The hypothesis test for both the binomial and the normal approximation is in terms of np ... as the random variable represents counting the number of "successes".

Its worth noting that both the binomial and the normal approximation is defned on a domain 0..n whereas if you wanted to think of the random variable/distribution as being an estimate of the probability p, then it would be defined on the domain 0..1. So doing a linear transformation of the random variable by dividing by n, you get the usual results for the new (normal) random variable
mean = p
variance = p(1-p)/n
where the variance now descreases as 1/n as you note happens with the usual mean estimator of a normal distribution. Plotting binomial distributions, you should note that the width descreases compared to the domain 0..n as n increases.

So intperpreting the binomial as a mean estiator by dividing the random variable by n is consistent with the formula for the more usual normal mean estimator. Though they are different distributions/random variables , hence the different formulae.

So if n=10 and p=0.5 a binomial sample might look like
0,1,1,0,1,0,0,0,1,0
and 10 samples from a normal with mu=0.5 and std dev = 0.1, might be
0.55,0.42,0.61,0.35,0.49,0.45,0.7,0.53,0.57,0.43
Their averages represent an esimate of p or mu (0.5) and the variance of the estimates decrease as 1/n.


https://www.physicsandmathstutor.com/pdf-pages/?pdf=https%3A%2F%2Factiveteach-prod.resource.pearson-intl.com%2Fr00%2Fr0071%2Fr007110%2Fr00711050%2Fcurrent%2Falevelsb_sm2_revex1.pdf

this is the solution bank, the question is challenge question 2 - the very last question


Ohhhh I think i get why we dont divide by n now. its because we are finding a range for the number of successes and not a range for the mean of those successes, thank you, but I still dont really get the H0 and H1 bit being in terms of the mean
(edited 1 year ago)
Original post by FM1/FP1
https://www.physicsandmathstutor.com/pdf-pages/?pdf=https%3A%2F%2Factiveteach-prod.resource.pearson-intl.com%2Fr00%2Fr0071%2Fr007110%2Fr00711050%2Fcurrent%2Falevelsb_sm2_revex1.pdf

this is the solution bank, the question is challenge question 2 - the very last question


Ohhhh I think i get why we dont divide by n now. its because we are finding a range for the number of successes and not a range for the mean of those successes, thank you, but I still dont really get the H0 and H1 bit being in terms of the mean


To try and summarise, the binomial distribution is defined on 0..n where in this case n=300. The mean and variance are np and np(1-p). We can approximate this with a normal distribution with the same mean and variance and we use this for the hypothesis test as per the solution bank. The main body of the distribution (either binomial or normal approximation) lies between 144 and 173 and outside that the tails correspond to the critical regions. The location of the critical regions depends on the number of data points as youd expect with a binomial distribution (and therefore its normal approximation) because the random variable is the number of successes out of n.

If we divided the random variable by n (a scaling), the mean of the transformed normal distribution approximation would be p and the variance would be p(1-p)/n. So now the variance will decrease as we increase n. This would be as youd expect if you had a population normal distirbution with a mean of p and a variance sigma^2. Your sample mean estimator would again have a mean of p and a variance sigma^2/n. The variance of both the scaled normal approximation and the sample mean estimator are inversely proportional to n as you should expect.

Both of these variance formulae are derived from the usual
E((x - xbar)^2)
so the average squared deviation from the mean. The different assumptions (binomial / normal with known variance) give the different formulae.
(edited 1 year ago)
Reply 8
Original post by mqb2766
To try and summarise, the binomial distribution is defined on 0..n where in this case n=300. The mean and variance are np and np(1-p). We can approximate this with a normal distribution with the same mean and variance and we use this for the hypothesis test as per the solution bank. The main body of the distribution (either binomial or normal approximation) lies between 144 and 173 and outside that the tails correspond to the critical regions. The location of the critical regions depends on the number of data points as youd expect with a binomial distribution (and therefore its normal approximation) because the random variable is the number of successes out of n.

If we divided the random variable by n (a scaling), the mean of the transformed normal distribution approximation would be p and the variance would be p(1-p)/n. So now the variance will decrease as we increase n. This would be as youd expect if you had a population normal distirbution with a mean of p and a variance sigma^2. Your sample mean estimator would again have a mean of p and a variance sigma^2/n. The variance of both the scaled normal approximation and the sample mean estimator are inversely proportional to n as you should expect.

Both of these variance formulae are derived from the usual
E((x - xbar)^2)
so the average squared deviation from the mean. The different assumptions (binomial / normal with known variance) give the different formulae.


thanks for the additional info, I don't get everything but it was enough for me to understand my original question and some 😁

Quick Reply

Latest