# The central limit theorem and its importance to statistical estimation

WatchPage 1 of 1

Go to first unread

Skip to page:

Hi guys

I'm having trouble with this quote

"The central limit theorem is the cornerstone of statistical estimation."

I can see how this could be true when trying to estimate certain things about a big population but what about samples that are taken from small populations ?

I've been asked to comment on the veracity of the statement but am not really sure what I can comment on besides the distribution becoming normal as the sample size increases. Can anyone suggest any other things I should consider when trying to answer this question?

After posting this, I've realised that the reason I'm struggling to answer this is because I'm finding it hard to see the point or importance of having a normal distribution. I'm new to stats and my brain is working overtime on this subject and I know the point I just mentioned is quite valuable so I have to find a way to clarify this. The normal distribution is to show an average... So the central limit theorem helps to find that average from a population/sample that might have a not normal distribution....

But using an example like the average height for the basketball league. Without doing a census and just using a sample and the height goes from 1m to 3m, the normal distribution is always going to show 2m? But how can that be? What if 80% of the height were just under 3m, how does the clt help? The graph would not be normal no matter how big the sample was , or does the clt estimate the average height will always be 2m?

I know I might be missing alot of concepts here but there is so much to take in and I am really struggling

Posted from TSR Mobile

I'm having trouble with this quote

"The central limit theorem is the cornerstone of statistical estimation."

I can see how this could be true when trying to estimate certain things about a big population but what about samples that are taken from small populations ?

I've been asked to comment on the veracity of the statement but am not really sure what I can comment on besides the distribution becoming normal as the sample size increases. Can anyone suggest any other things I should consider when trying to answer this question?

After posting this, I've realised that the reason I'm struggling to answer this is because I'm finding it hard to see the point or importance of having a normal distribution. I'm new to stats and my brain is working overtime on this subject and I know the point I just mentioned is quite valuable so I have to find a way to clarify this. The normal distribution is to show an average... So the central limit theorem helps to find that average from a population/sample that might have a not normal distribution....

But using an example like the average height for the basketball league. Without doing a census and just using a sample and the height goes from 1m to 3m, the normal distribution is always going to show 2m? But how can that be? What if 80% of the height were just under 3m, how does the clt help? The graph would not be normal no matter how big the sample was , or does the clt estimate the average height will always be 2m?

I know I might be missing alot of concepts here but there is so much to take in and I am really struggling

Posted from TSR Mobile

0

reply

Report

#2

(Original post by

Hi guys

**Fyer1234**)Hi guys

Actually, I'm not sure you understand the statement of the CLT. It says that if you repeatedly take samples of a fixed size (the larger, the better), then the means of those samples will approximate a normal distribution centred on the original distribution's mean. It doesn't say that if the sample means all lie between 1m and 3m then the normal distribution will have mean 2m: consider the case that all humans but one are 2.5m tall, so nearly all the samples are simply lists of 2.5, so all the sample means are all very close to 2.5, so the normal distribution is centred on 2.5.

1

reply

(Original post by

This is precisely why essentially all statistics taken on small populations have large variances. The CLT is key for getting estimations which don't have really large variances. It tells you with fairly small variance where the mean of a distribution lies, *whatever* the distribution (as long as it does have a mean and variance). As you say, it does require taking lots of samples if the underlying distribution isn't well-behaved; however, in the cases where CLT is not useful, it's often true that statistics is simply not something that helps. For instance, if you have only two samples, there's very little you can say about the distribution.

Actually, I'm not sure you understand the statement of the CLT. It says that if you repeatedly take samples of a fixed size (the larger, the better), then the means of those samples will approximate a normal distribution centred on the original distribution's mean. It doesn't say that if the sample means all lie between 1m and 3m then the normal distribution will have mean 2m: consider the case that all humans but one are 2.5m tall, so nearly all the samples are simply lists of 2.5, so all the sample means are all very close to 2.5, so the normal distribution is centred on 2.5.

**Smaug123**)This is precisely why essentially all statistics taken on small populations have large variances. The CLT is key for getting estimations which don't have really large variances. It tells you with fairly small variance where the mean of a distribution lies, *whatever* the distribution (as long as it does have a mean and variance). As you say, it does require taking lots of samples if the underlying distribution isn't well-behaved; however, in the cases where CLT is not useful, it's often true that statistics is simply not something that helps. For instance, if you have only two samples, there's very little you can say about the distribution.

Actually, I'm not sure you understand the statement of the CLT. It says that if you repeatedly take samples of a fixed size (the larger, the better), then the means of those samples will approximate a normal distribution centred on the original distribution's mean. It doesn't say that if the sample means all lie between 1m and 3m then the normal distribution will have mean 2m: consider the case that all humans but one are 2.5m tall, so nearly all the samples are simply lists of 2.5, so all the sample means are all very close to 2.5, so the normal distribution is centred on 2.5.

I think I'm going to have to go over things a bit more as I'm still really lost. When I think of normal distribution I see the bell shaped curve. But does the bell shape curve always peak in the middle of the two measurements in this case 1m and 3m?

I can see that your explanation says no but I can't see how the distribution can't be bell shaped unless it peaks in the middle of the two measurements... Can it start its rise a little bit in? (oh dear that last sentence makes me sound so simple)

Posted from TSR Mobile

0

reply

Report

#4

(Original post by

Thanks for your reply

I think I'm going to have to go over things a bit more as I'm still really lost. When I think of normal distribution I see the bell shaped curve. But does the bell shape curve always peak in the middle of the two measurements in this case 1m and 3m?

I can see that your explanation says no but I can't see how the distribution can't be bell shaped unless it peaks in the middle of the two measurements... Can it start its rise a little bit in? (oh dear that last sentence makes me sound so simple)

Posted from TSR Mobile

**Fyer1234**)Thanks for your reply

I think I'm going to have to go over things a bit more as I'm still really lost. When I think of normal distribution I see the bell shaped curve. But does the bell shape curve always peak in the middle of the two measurements in this case 1m and 3m?

I can see that your explanation says no but I can't see how the distribution can't be bell shaped unless it peaks in the middle of the two measurements... Can it start its rise a little bit in? (oh dear that last sentence makes me sound so simple)

Posted from TSR Mobile

If I do this again, I might get the measurements 0.1 and -4. Again, the mean I'd predict from that is -1.95, which isn't very close to the true mean of 0.

Suppose I did this "pick pairs, take the mean of the pair" procedure several times and got 1.05, -1.95, 0.2, 0.5, -0.3. It becomes clear that the distribution mean is approximately 0 (in fact, the mean of these sample means is -0.1). It's the Central Limit Theorem that has allowed me to say that. Without the CLT, I can't go from "I drew lots of samples, and their means clustered around 0" to "the distribution mean is around 0", and that's a crucial part of statistics: without the CLT, it's really hard to say anything about the distribution's mean even if we have taken lots of samples and know the means of the samples.

0

reply

(Original post by

Consider the standard normal distribution: mean 0, variance 1. Imagine I take two samples from it, and get as my measurements 0.5 and 1.6. Now, that's told me very little about the mean: in particular, it hasn't told me enough to say that it's halfway between 0.5 and 1.6. All I can say is that the mean is unlikely to be horribly far away from 1.05.

If I do this again, I might get the measurements 0.1 and -4. Again, the mean I'd predict from that is -1.95, which isn't very close to the true mean of 0.

Suppose I did this "pick pairs, take the mean of the pair" procedure several times and got 1.05, -1.95, 0.2, 0.5, -0.3. It becomes clear that the distribution mean is approximately 0 (in fact, the mean of these sample means is -0.1). It's the Central Limit Theorem that has allowed me to say that. Without the CLT, I can't go from "I drew lots of samples, and their means clustered around 0" to "the distribution mean is around 0", and that's a crucial part of statistics: without the CLT, it's really hard to say anything about the distribution's mean even if we have taken lots of samples and know the means of the samples.

**Smaug123**)Consider the standard normal distribution: mean 0, variance 1. Imagine I take two samples from it, and get as my measurements 0.5 and 1.6. Now, that's told me very little about the mean: in particular, it hasn't told me enough to say that it's halfway between 0.5 and 1.6. All I can say is that the mean is unlikely to be horribly far away from 1.05.

If I do this again, I might get the measurements 0.1 and -4. Again, the mean I'd predict from that is -1.95, which isn't very close to the true mean of 0.

Suppose I did this "pick pairs, take the mean of the pair" procedure several times and got 1.05, -1.95, 0.2, 0.5, -0.3. It becomes clear that the distribution mean is approximately 0 (in fact, the mean of these sample means is -0.1). It's the Central Limit Theorem that has allowed me to say that. Without the CLT, I can't go from "I drew lots of samples, and their means clustered around 0" to "the distribution mean is around 0", and that's a crucial part of statistics: without the CLT, it's really hard to say anything about the distribution's mean even if we have taken lots of samples and know the means of the samples.

I'll have to keep re-reading this as I kinda understand but again I've been at it all day so am bit fuzzy at the mo. Need to sleep on it

Posted from TSR Mobile

0

reply

**Smaug123**)

Consider the standard normal distribution: mean 0, variance 1. Imagine I take two samples from it, and get as my measurements 0.5 and 1.6. Now, that's told me very little about the mean: in particular, it hasn't told me enough to say that it's halfway between 0.5 and 1.6. All I can say is that the mean is unlikely to be horribly far away from 1.05.

If I do this again, I might get the measurements 0.1 and -4. Again, the mean I'd predict from that is -1.95, which isn't very close to the true mean of 0.

Suppose I did this "pick pairs, take the mean of the pair" procedure several times and got 1.05, -1.95, 0.2, 0.5, -0.3. It becomes clear that the distribution mean is approximately 0 (in fact, the mean of these sample means is -0.1). It's the Central Limit Theorem that has allowed me to say that. Without the CLT, I can't go from "I drew lots of samples, and their means clustered around 0" to "the distribution mean is around 0", and that's a crucial part of statistics: without the CLT, it's really hard to say anything about the distribution's mean even if we have taken lots of samples and know the means of the samples.

If the true population parameter is known then what is the point of sampling ? This is what I'm confused about. If the CLT is so useful in predictions regarding the samples taken from a population, then why sample?

Using an example of a population of teachers within a city. I choose to sample the teachers who teach maths and wish to find out how many of these drive a red car. Can someone please explain how the CLT would be any good here?

Posted from TSR Mobile

0

reply

Report

#7

(Original post by

So basically , if I'm correct, the CLT makes an estimation on the population parameters from sampling distribution value?

If the true population parameter is known then what is the point of sampling ? This is what I'm confused about. If the CLT is so useful in predictions regarding the samples taken from a population, then why sample?

Using an example of a population of teachers within a city. I choose to sample the teachers who teach maths and wish to find out how many of these drive a red car. Can someone please explain how the CLT would be any good here?

**Fyer1234**)So basically , if I'm correct, the CLT makes an estimation on the population parameters from sampling distribution value?

If the true population parameter is known then what is the point of sampling ? This is what I'm confused about. If the CLT is so useful in predictions regarding the samples taken from a population, then why sample?

Using an example of a population of teachers within a city. I choose to sample the teachers who teach maths and wish to find out how many of these drive a red car. Can someone please explain how the CLT would be any good here?

If I sample ten teachers and one drives a red car, then my guess for p is 1/10. If I re-sample and get 5/10, then the CLT tells me that the true value of p is not too far from 3/10. I still don't know the true value: that's what statistics does, is deducing true values from measured values.

If I re-sample more and get 1/10, 5/10, 2/10, 3/10, 1/10, then the CLT tells me that p is roughly 0.24: that is, approximately 240 maths teachers drive red cars. Without the CLT, I can't make that statement.

0

reply

Thanks mate

So it's more of a statement than a calculation that is practically used in statistics?

Or

It is only used when the population distribution is not known...?

Posted from TSR Mobile

So it's more of a statement than a calculation that is practically used in statistics?

Or

It is only used when the population distribution is not known...?

Posted from TSR Mobile

0

reply

Report

#9

(Original post by

Thanks mate

So it's more of a statement than a calculation that is practically used in statistics?

Or

It is only used when the population distribution is not known...?

**Fyer1234**)Thanks mate

So it's more of a statement than a calculation that is practically used in statistics?

Or

It is only used when the population distribution is not known...?

0

reply

Mmm I'm starting to get it, thanks

Put simply, when we take a sample the results we have will only be reflective of that sample, eg. Brown haired people in London who live in a flat with a cat. The more we sample, the theorem willbe able to tell us that the mean of these samples will be close to the true value of the population.

Although it would be very hard to sample all the brown haired people etc etc, the CLT enables us to make a goo estimate without doing a census..?

Am I starting to get the gist?

Put simply, when we take a sample the results we have will only be reflective of that sample, eg. Brown haired people in London who live in a flat with a cat. The more we sample, the theorem willbe able to tell us that the mean of these samples will be close to the true value of the population.

Although it would be very hard to sample all the brown haired people etc etc, the CLT enables us to make a goo estimate without doing a census..?

Am I starting to get the gist?

0

reply

Report

#11

(Original post by

Mmm I'm starting to get it, thanks

Put simply, when we take a sample the results we have will only be reflective of that sample, eg. Brown haired people in London who live in a flat with a cat. The more we sample, the theorem willbe able to tell us that the mean of these samples will be close to the true value of the population.

Although it would be very hard to sample all the brown haired people etc etc, the CLT enables us to make a goo estimate without doing a census..?

Am I starting to get the gist?

**Fyer1234**)Mmm I'm starting to get it, thanks

Put simply, when we take a sample the results we have will only be reflective of that sample, eg. Brown haired people in London who live in a flat with a cat. The more we sample, the theorem willbe able to tell us that the mean of these samples will be close to the true value of the population.

Although it would be very hard to sample all the brown haired people etc etc, the CLT enables us to make a goo estimate without doing a census..?

Am I starting to get the gist?

0

reply

(Original post by

Yep, exactly.

**Smaug123**)Yep, exactly.

Hopefully stats will start to get a bit easier soon as at the moment I'm so stressed out lol

Posted from TSR Mobile

0

reply

Report

#13

(Original post by

Once again Smaug, thanks for your help.

Hopefully stats will start to get a bit easier soon as at the moment I'm so stressed out lol

Posted from TSR Mobile

**Fyer1234**)Once again Smaug, thanks for your help.

Hopefully stats will start to get a bit easier soon as at the moment I'm so stressed out lol

Posted from TSR Mobile

0

reply

X

Page 1 of 1

Go to first unread

Skip to page:

### Quick Reply

Back

to top

to top