# Sample and population Variance and SDWatch

#1
I am reposting this part of an answer I gave in order to get an explanation from a stats expert (hopefully).

My confusion is with this wikipedia entry where variance is said to be unbiased whereas SD is biased. The entry says

"...the sample variance is an unbiased estimator for the population variance, but its square root, the sample standard deviation, is a biased estimator for the population standard deviation.

My understanding is that sample bias comes about due to the increased probability that a sample is near the mean in a sample of a distribution than is the case with a full population. The n-1 compensates for that.

In the wikipedia entry I can't see how the variance can be unbiased whilst its root is then biased. Is this correct? In stats it is so often a case of what words mean rather than what numbers mean.
0
2 years ago
#2
(Original post by nerak99)
I am reposting this part of an answer I gave in order to get an explanation from a stats expert (hopefully).

My confusion is with this wikipedia entry where variance is said to be unbiased whereas SD is biased. The entry says

"...the sample variance is an unbiased estimator for the population variance, but its square root, the sample standard deviation, is a biased estimator for the population standard deviation.
Yes, this is one of those nasty little facts that creeps up behind you and bops you on the head. One of the things that it tells you is that finding unbiased estimators for things can be hard; a second thing that is less often appreciated is that unbiased estimators may not be the nirvana one is searching for. If one is doing predictive estimation, it is not unusual for a biased predictor to give a lower RMS prediction error than an unbiased one.

It's a nice problem in mathematical statistics (which is sometimes used to torture students) to show what the expected value for the sample standard deviation is when dealing with Normal random variables.

My understanding is that sample bias comes about due to the increased probability that a sample is near the mean in a sample of a distribution than is the case with a full population. The n-1 compensates for that.
I'm not sure what you mean by saying that a sample is "near the mean" - the expected value of the empirical distribution function of a random sample is the population distribution. The usual explanation of the n-1 correction is that you're using the sample mean to estimate the population mean in the calculation of the sample standard deviation - and that needs a correction to take it into account.

In the wikipedia entry I can't see how the variance can be unbiased whilst its root is then biased. Is this correct? In stats it is so often a case of what words mean rather than what numbers mean.
The basic point here is that need not be equal to for a general function f. Just write the equations down and you'll see that there's no reason to expect them to be equal.

If f is linear, all is sweetness and light, and if X is specified, it's sometimes possible to calculate how these two will differ.
0
X

new posts
Latest
My Feed

### Oops, nobody has postedin the last few hours.

Why not re-start the conversation?

see more

### See more of what you like onThe Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

### Poll

Join the discussion

#### Are you tempted to change your firm university choice on A-level results day?

Yes, I'll try and go to a uni higher up the league tables (164)
17.45%
Yes, there is a uni that I prefer and I'll fit in better (82)
8.72%
No I am happy with my course choice (560)
59.57%
I'm using Clearing when I have my exam results (134)
14.26%