The Student Room Group

Converting between Variances in S2

Hey all, I do OCR so I'm not sure if you're required to do it in other specs.

Basically I am very unsure when to use the different conversions of variances between the population and sample. I'm not even sure whether I fully grasp the concepts of which variance represents what.

The ones that I am aware of so far are

s^2 = Var(Xbar) * n/(n-1)
and
Var(Xbar) = sigma^2 /n

I hope these look familiar since I'm not even sure whether I'm using the right notation.

I thought I had a grasp of these concepts until recently, when a question gives the sample variance. The mark scheme then wants you to find the population variance and then convert back to the sample variance, but by using th two different equations, and if you're converting there and back then what's the point in converting?

My teacher didn't explain these concepts properly and I am struggling to find any explanations online, so I would really appreciate it if someone could explain what each variance represents and why you use it/ when you use it.

Thank you!
(edited 5 years ago)
Reply 1
Original post by Quarkboi
Hey all, I do OCR so I'm not sure if you're required to do it in other specs.

Basically I am very unsure when to use the different conversions of variances between the population and sample. I'm not even sure whether I fully grasp the concepts of which variance represents what.

The ones that I am aware of so far are

s^2 = Var(Xbar) * n/(n-1)
and
Var(Xbar) = sigma^2 /n

I hope these look familiar since I'm not even sure whether I'm using the right notation.

I thought I had a grasp of these concepts until recently, when a question gives the sample variance. The mark scheme then wants you to find the population variance and then convert back to the sample variance, but by using th two different equations, and if you're converting there and back then what's the point in converting?

My teacher didn't explain these concepts properly and I am struggling to find any explanations online, so I would really appreciate it if someone could explain what each variance represents and why you use it/ when you use it.

Thank you!


Could u post the question?
Reply 2


I think you're misreading the mark scheme. It shows that you get B2 B1 M1 M1 A1, then there are two different ways of getting the next M1 A1 A1.

In the first way, you get M1 for "z = (6.2-6.1)/sqrt(0.643/80)", and then A1 for correctly working this out as 1.115, or for correctly working out the probability as 0.1325; then the final A1 is for either comparing z: 1.115 < 1.645, or for comparing p: 0.1325 > 0.05. In this way, you can see that the first way divides into two sub-methods: one using z, and the other using p.

Then there's the second way, in which you get M1 for "6.1 + 1.645*sqrt(0.643/80)", A1 for working this out as 6.247, and the final A1 for comparing 6.2 < 6.247.

You do only one of these two ways, not both, so you don't "convert and then convert back". After scoring this M1 A1 A1 in one of the two ways, you then get the final M1 A1 for the conclusion.
Reply 4
There are quite a few comparisons of sample and population variance, just google and see which is the most readable for you.

I always think about what happens when you have one or two points in a sample.

When you have a single point N=1,
* Sample variance: is not defined because you divide by N-1=0. This is because you use the single point to estimate the mean and there is nothing left to calculate the variance - the "error" from the mean (x-mu) is zero because the mean and the point are the same
* Population variance: you know the mean, so the single point can be used to calculate (x-mu)^2, and as only a single point is used in the summation, you divide by 1.

When you have two points N=2
* Sample variance. The mean is the average of the two points. The two "errors" from the mean are always the same, so really you have just one bit of information when you estimate the variance, so you divide by N-1=1.
* Population variance: you know the mean, the two points are independent, as is the squared error (x_i-mu)^2 so to estimate the variance, add them up and divide by 2.

Not necessarily rigorous but it is easy to remember and sounds plausible.

,
Original post by Quarkboi

Basically I am very unsure when to use the different conversions of variances between the population and sample. I'm not even sure whether I fully grasp the concepts of which variance represents what.

A good source for the details is this subsection of the Wikipedia article on variance. Let me just a few words about the concepts.

The idea of the population variance is to give you a measure of the spread of the data around its mean value. Hence you get the standard formula of a sum (or integral) of squared deviations from the mean. If you are given the fully specified probability distribution of the population, then it's a matter of algebra to work out the population variance.

If you draw a sample from the population, then the principles are the same - you can calculate the sample mean and the sample variance as measures of the location and spread of the sample. But what if you wanted to use your sample to estimate things about the population? So here you're in the situation that you don't know the underlying population distribution at all - all you have is the sample. Well, what you'd really like are unbiased estimators of the underlying population parameters - what does this mean? It means that if you were able to take repeated samples from the underlying population, and for each of these samples were able to calculate estimates of the population mean and variance from these sample, then the mean value of the estimates would equal the underlying population mean and variance respectively.

For an unbiased estimator of the mean, life is simple. The mean of the sample is an unbiased estimator of the population mean. For variance, the situation is a little more tricky, and the trickiness stems from the fact that when you work out the sample variance, you have to use the sample mean as an estimate of the population mean in the variance formula. When you work through the algebra, it turns out that you need that factor of (n - 1) to get the estimate of the population variance to be unbiased.
Reply 6
Original post by Gregorius
A good source for the details is [url=&quot;https://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance&quot;]this subsection of the Wikipedia article
on variance. Let me just a few words about the concepts.

The idea of the population variance is to give you a measure of the spread of the data around its mean value. Hence you get the standard formula of a sum (or integral) of squared deviations from the mean. If you are given the fully specified probability distribution of the population, then it's a matter of algebra to work out the population variance.

If you draw a sample from the population, then the principles are the same - you can calculate the sample mean and the sample variance as measures of the location and spread of the sample. But what if you wanted to use your sample to estimate things about the population? So here you're in the situation that you don't know the underlying population distribution at all - all you have is the sample. Well, what you'd really like are unbiased estimators of the underlying population parameters - what does this mean? It means that if you were able to take repeated samples from the underlying population, and for each of these samples were able to calculate estimates of the population mean and variance from these sample, then the mean value of the estimates would equal the underlying population mean and variance respectively.

For an unbiased estimator of the mean, life is simple. The mean of the sample is an unbiased estimator of the population mean. For variance, the situation is a little more tricky, and the trickiness stems from the fact that when you work out the sample variance, you have to use the sample mean as an estimate of the population mean in the variance formula. When you work through the algebra, it turns out that you need that factor of (n - 1) to get the estimate of the population variance to be unbiased.

Thanks, this helps a lot. One thing I'd like to know is what is the difference between the unbiased estimate of the variance and the Var(Xsample) = sigma^2 /n?

Original post by Prasiortle
I think you're misreading the mark scheme. It shows that you get B2 B1 M1 M1 A1, then there are two different ways of getting the next M1 A1 A1.

In the first way, you get M1 for &quot;z = (6.2-6.1)/sqrt(0.643/80)&quot;, and then A1 for correctly working this out as 1.115, or for correctly working out the probability as 0.1325; then the final A1 is for either comparing z: 1.115 &lt; 1.645, or for comparing p: 0.1325 &gt; 0.05. In this way, you can see that the first way divides into two sub-methods: one using z, and the other using p.

Then there's the second way, in which you get M1 for &quot;6.1 + 1.645*sqrt(0.643/80)&quot;, A1 for working this out as 6.247, and the final A1 for comparing 6.2 &lt; 6.247.

You do only one of these two ways, not both, so you don't &quot;convert and then convert back&quot;. After scoring this M1 A1 A1 in one of the two ways, you then get the final M1 A1 for the conclusion.


Thanks for taking the effort to explain, but that's not the bit I meant about having to convert there and back again. Earlier on in the question you are required to use the sample variance to get an unbiased estimate of the population variance [by multiplying by n/(n-1)], but then once you get this population variance you need to get the sample variance by dividing the population variance by n (this is where the /80 bit is seen in the mark scheme, and they do not award any subsequent marks if 80 is not included). I am just quite confused why you need to go to the population variance and back if you already have the sample variance - it seems quite an important distinction to make seeming as they don't allow the following 5 marks even if the only thing omitted is the /80, but I have no idea why.
Original post by Quarkboi
Thanks, this helps a lot. One thing I'd like to know is what is the difference between the unbiased estimate of the variance and the Var(Xsample) = sigma^2 /n?


Above I talked about estimating the variance of the underlying population by using the variance of a sample. Now we're talking about estimating the variance of the mean (which gives you the formula σ2/n\sigma^2/n).

So, if you draw a random sample from an underlying population, you can calculate the mean of that sample. Now do it again with a different sample, and again and again and again... Each time you will get a value for the mean of a particular sample, and these means are likely to all be slightly different. In other words, the means of these samples will have their own sampling distribution. The variance of that sampling distribution, which quantifies how spread out those means are, is σ2/n\sigma^2/n.
Reply 8
Original post by Gregorius
Above I talked about estimating the variance of the underlying population by using the variance of a sample. Now we're talking about estimating the variance of the mean (which gives you the formula σ2/n\sigma^2/n).

So, if you draw a random sample from an underlying population, you can calculate the mean of that sample. Now do it again with a different sample, and again and again and again... Each time you will get a value for the mean of a particular sample, and these means are likely to all be slightly different. In other words, the means of these samples will have their own sampling distribution. The variance of that sampling distribution, which quantifies how spread out those means are, is σ2/n\sigma^2/n.


Oh I see! Thanks a lot. Would you mind explaining why we must use this step in the example I provided, as I now understand what it is but not exactly why we use it (especially why we use in in conjunction with the unbiased estimate of the variance).
Original post by Quarkboi
Oh I see! Thanks a lot. Would you mind explaining why we must use this step in the example I provided, as I now understand what it is but not exactly why we use it (especially why we use in in conjunction with the unbiased estimate of the variance).


Because the question asks you to perform a statistical test on the value of the mean - so you need to know what the sampling distribution of the mean is. To do this, you need to know the variance of the mean (σ2/n \sigma^2/n), and to do this you need to have an estimate of the population variance (σ2\sigma^2).

Quick Reply

Latest