# S1 - Central Limit Theorem

Watch
Announcements
#1
I really am not sure what I want to ask because I'm so confused about CLT.

Here is what I think

If you take lots of samples of a particular size and find the means of each of those samples then all the means form a normal distribution?

The bigger the sample size, the nearer to a normal distribution?

Sample sizes of 30 are ok unless the population is very skewed?

This is now where I get worried. When I use the X(bar) in the formula

Z = (X(bar) - population mean)/(sigma/root sample size) is X(bar) the mean of one of the samples? So that X(bar) value could be anywhere in relation to the mean of the population?

The above formula would always be used when a sample is mentioned regardless of whether the sample has been taken from a normal distribution or not?

The CL theorem is NOT the formula but just something that shows you that the formula works?

In questions that ask you where the CLT was used, you do not say part a) just because you used the formula in part a) but because.....because what?

When you do the confidence limit what you are trying to work out is how confident you are that the mean you are using is close to the population mean?

It's all a bit of a ramble this. But I just don't get what I'm aiming to do. I sort of do the maths by rote but the question about when do use the CLT I have no idea because I have so many loose end in my thinking.
0
6 years ago
#2
To get you're Z value you need to normalise/standardise the data. You do this by subtracting the mean and dividing by the standard deviation. This method ONLY works for the normal distribution. So, in order to allow you to use the Normal Distribution (and therefore you're Z value), you need to be able to justify that that data is normally distributed.

That's where the central limit theorem comes into play. Using this theorem lets you justify that the data is normally distributed and therefore you can use this method.
0
#3
(Original post by claret_n_blue)
To get you're Z value you need to normalise/standardise the data. You do this by subtracting the mean and dividing by the standard deviation. This method ONLY works for the normal distribution. So, in order to allow you to use the Normal Distribution (and therefore you're Z value), you need to be able to justify that that data is normally distributed.

That's where the central limit theorem comes into play. Using this theorem lets you justify that the data is normally distributed and therefore you can use this method.
I misunderstand then. I thought the CLT tells us that the means of samples tend to lead to a normal distribution rather than the the data itself being normal.

I'll have to try again to get to grips with this.
0
6 years ago
#4
(Original post by maggiehodgson)
I misunderstand then. I thought the CLT tells us that the means of samples tend to lead to a normal distribution rather than the the data itself being normal.

I'll have to try again to get to grips with this.
You're right. The Central Limit Theorem states that if we take a collection of samples from any distribution, then the means of those samples will themselves look like a collection of samples from a normal distribution. This is true regardless of the original distribution (although it is exactly true, not just "true for big n", when the original distribution was normal).
0
#5
(Original post by Smaug123)
You're right. The Central Limit Theorem states that if we take a collection of samples from any distribution, then the means of those samples will themselves look like a collection of samples from a normal distribution. This is true regardless of the original distribution (although it is exactly true, not just "true for big n", when the original distribution was normal).
Thanks for that.

If you find yourself with some spare time, I wonder if you would go through my original post and check out and correct my thinking. No worries if that's not possible, what you have already said is a big help.
0
6 years ago
#6
(Original post by maggiehodgson)
Thanks for that.

If you find yourself with some spare time, I wonder if you would go through my original post and check out and correct my thinking. No worries if that's not possible, what you have already said is a big help.
X(bar) could indeed be anywhere in relation to the true mean - but the CLT tells us that it is normally distributed, so it's quite likely to be near to the true mean.

You "use the CLT" whenever you approximate the distribution of the sample means as a normal distribution. In practice, it's wherever you used the formula - I'd say something like "Part a, because in part a we approximated the sample means to a normal distribution (as in the line <line where the formula was used>)".

Not sure what you mean by "confidence limit" - could you give me an example? (If it were "confidence interval", I could give an answer…)
0
#7
(Original post by Smaug123)
X(bar) could indeed be anywhere in relation to the true mean - but the CLT tells us that it is normally distributed, so it's quite likely to be near to the true mean.

You "use the CLT" whenever you approximate the distribution of the sample means as a normal distribution. In practice, it's wherever you used the formula - I'd say something like "Part a, because in part a we approximated the sample means to a normal distribution (as in the line <line where the formula was used>)".

Not sure what you mean by "confidence limit" - could you give me an example? (If it were "confidence interval", I could give an answer…)

Yes, it is confidence interval.

So if you were told that the population from which the sample was taken was normally distributed, you would still use the formula for confidence intervals not a different one?

When do you say that you've not used CLT?

I thought I'd got it but still not there am I.
0
6 years ago
#8
(Original post by maggiehodgson)
Yes, it is confidence interval.

So if you were told that the population from which the sample was taken was normally distributed, you would still use the formula for confidence intervals not a different one?

When do you say that you've not used CLT?

I thought I'd got it but still not there am I.
What is your formula for confidence intervals?

Sorry, "*not* used CLT"? You could say you've not used the CLT whenever you *haven't* approximated a collection of sample means as following a normal distribution.

A 95% confidence interval for a mean (say) is an interval [a,b] such that Probability(a < mean < b) = 95%. That's true whether or not you use the CLT. The CLT is used in actually finding that Probability(a<mean<b); you use it to turn a complicated distribution (a collection of sample means) into a simple distribution (a normal one).
0
#9
(Original post by Smaug123)
What is your formula for confidence intervals?

Sorry, "*not* used CLT"? You could say you've not used the CLT whenever you *haven't* approximated a collection of sample means as following a normal distribution.

A 95% confidence interval for a mean (say) is an interval [a,b] such that Probability(a < mean < b) = 95%. That's true whether or not you use the CLT. The CLT is used in actually finding that Probability(a<mean<b); you use it to turn a complicated distribution (a collection of sample means) into a simple distribution (a normal one).
formula is x(bar) +/- (z)(sd /root sample size)

I've found a question that you asked for to illustrate my problem. It's AQA May 2006, Q4.

the weights of packets of sultanas may be assumed to be normally distributed with a standard deviation of 6grams.

it then gives you the weights of 10 random sample packets.

Then it asks for a 99% confidence interval for the mean weight of packets. That CI formula is used.

Then it asks "State why, in calculating your confidence interval, use of the CLT was NOT necessary." The mark scheme says "weights of packets can be assumed to be normally distributed"
0
6 years ago
#10
(Original post by maggiehodgson)
Then it asks "State why, in calculating your confidence interval, use of the CLT was NOT necessary." The mark scheme says "weights of packets can be assumed to be normally distributed"
Ah, that's because you didn't need to *approximate* the sample means, because the underlying distribution was normal so the sample means *exactly* follow the normal distribution. CLT says that it comes to approximate a normal distribution, but you already know it *is* a normal distribution, so you don't need to bother with the CLT.
0
#11
(Original post by Smaug123)
Ah, that's because you didn't need to *approximate* the sample means, because the underlying distribution was normal so the sample means *exactly* follow the normal distribution. CLT says that it comes to approximate a normal distribution, but you already know it *is* a normal distribution, so you don't need to bother with the CLT.

Super.

So, let me just check.

If the population is normally distributed and confidence intervals for a sample are asked for you use exactly the same formula as you would for calculating the CI for a non-normally distributed population's sample?

The CLT is quoted as being used when calculating CI for samples whose population has not been stated as being normally distributed but where the sample size is > 30?
0
6 years ago
#12
(Original post by maggiehodgson)
Super.

So, let me just check.

If the population is normally distributed and confidence intervals for a sample are asked for you use exactly the same formula as you would for calculating the CI for a non-normally distributed population's sample?

The CLT is quoted as being used when calculating CI for samples whose population has not been stated as being normally distributed but where the sample size is > 30?
Yep, I think that's right be aware, though, that they might expect you to infer that something "can be assumed to be normally distributed" - heights of people, for instance, are normally distributed but the question might not tell you so. If you make sure that you put down in your answer "assuming __ is normally distributed…" whenever you do assume something like that, you should be fine, I imagine.
0
#13
(Original post by Smaug123)
Yep, I think that's right be aware, though, that they might expect you to infer that something "can be assumed to be normally distributed" - heights of people, for instance, are normally distributed but the question might not tell you so. If you make sure that you put down in your answer "assuming __ is normally distributed…" whenever you do assume something like that, you should be fine, I imagine.

Thank you so much. This CLT thing has bugged me for months and has stopped me liking statistics. Perhaps I can get a little fonder of the subject now.

I'm an adult learner with with no teacher so being able to ask TSR for help is a real big help.
0
6 years ago
#14
(Original post by maggiehodgson)
Thank you so much. This CLT thing has bugged me for months and has stopped me liking statistics. Perhaps I can get a little fonder of the subject now.

I'm an adult learner with with no teacher so being able to ask TSR for help is a real big help.
Ah, right no problem!
0
X

new posts
Back
to top
Latest
My Feed

### Oops, nobody has postedin the last few hours.

Why not re-start the conversation?

see more

### See more of what you like onThe Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

### Poll

Join the discussion

#### Current uni students - are you thinking of dropping out of university?

Yes, I'm seriously considering dropping out (185)
14.13%
I'm not sure (59)
4.51%
No, I'm going to stick it out for now (384)
29.34%
I have already dropped out (37)
2.83%
I'm not a current university student (644)
49.2%