Turn on thread page Beta
    • Thread Starter
    Offline

    14
    My phone is spazzing out for the moment...sorry. I'll get back to this thread after I try sort this out
    • Thread Starter
    Offline

    14
    1.)Name:  image.jpg
Views: 60
Size:  12.6 KB

    2.)Attachment 459705459707
    For standard deviation, there are two methods(using either of the two formulas from above). The longer one (pic1)and the shorter one.(pic 2) How is it that they give me slightly different answers? Is that supposed to happen... Shouldn't the answer be the same.
    Attached Images
     
    Offline

    11
    ReputationRep:
    (Original post by Questioness)
    1.)Name:  image.jpg
Views: 60
Size:  12.6 KB

    2.)Attachment 459705459707
    For standard deviation, there are two methods(using either of the two formulas from above). The longer one (pic1)and the shorter one.(pic 2) How is it that they give me slightly different answers? Is that supposed to happen... Shouldn't the answer be the same.
    Try replacing (n-1) by n in the first formula.
    • Thread Starter
    Offline

    14
    (Original post by metaltron)
    Try replacing (n-1) by n in the first formula.
    Wow thank you. It's exactly the Same now. How come the formulas say n-1?
    Offline

    11
    ReputationRep:
    (Original post by Questioness)
    Wow thank you. It's exactly the Same now. How come the formulas say n-1?
    In short the two formulas have slightly different uses. I think you should use the first one (with the (n-1)) as this gives an estimate of the variance of X if you have taken a sample x_1, x_2, ..., x_n of values of X. I will try to write out something explaining why the two formulas are different in a bit.
    Offline

    11
    ReputationRep:
    (Original post by Questioness)
    Wow thank you. It's exactly the Same now. How come the formulas say n-1?
    I'll talk about variance mainly, which is just the square of the standard deviation.

    I''ll first explain how you might derive a formula for the variance, which is designed to be a measure of how spread out your data is. Suppose we measure something n times, and we get values x_1, x_2, ... , x_n.
    Let x(bar) = (x_1 + x_2 + ... + x_n)/n.

    We want to work out how spread out the data is from the mean, so the quantities  x_i - \bar{x} will be important for 1 <= i <= n. So we could take the mean of these as a possible measure of spread (average difference between value and mean), however  \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x}) = 0 (Persuade yourself that this is true!) so this is a rubbish measure of spread!

    The easy fix is to take the average squared difference between each value and the sample mean and then we do get a measure of spread in that the higher the variance, the bigger the spread about the sample mean:

     \displaystyle \sigma^2 = \sum_{i=1}^n \frac{(x_i - \bar{x})^2}{n}

    Since we squared each value when we calculated the variance the units are all wrong, which is why the standard deviation is also useful since square root now gives you back the correct units.

    Now notice that:

     \displaystyle \sigma^2 = \sum_{i=1}^n \frac{(x_i - \bar{x})^2}{n} =  \sum_{i=1}^n \frac{x_i^2 - 2x_i \bar{x} + \bar{x}^2}{n} = \sum_{i=1}^n \frac{x_i^2}{n} - 2\bar{x} \sum_{i=1}^n ( \frac{x_i}{n} ) + \bar{x}^2 =  \sum_{i=1}^n \frac{x_i^2}{n} - \bar{x}^2

    So we have the alternative formula:

     \displaystyle \sigma^2 = \sum_{i=1}^n \frac{x_i^2}{n} - (\frac {\sum_{i=1}^n x_i}{n}) ^2

    So now suppose we have a random variable X. Then X will have a mean and a variance, call them  \mu and V. However, we don't know what the values of the mean and variance are. For example, suppose we are calculating average human height, then we don't know exactly what the mean/variance is, we can only get a rough idea by taking a sample of humans.

    So say we have a sample x_1, x_2 , ... , x_n of values from X. Then if we want to estimate  \mu it turns out that the sample mean  \displaystyle \bar{x} = \frac {\sum_{i=1}^n x_i}{n} is a good estimate (in that its expected value is equal to  \mu ) . We say that  \bar{x} is an unbiased estimator of the mean.

    Now let  \displaystyle \sigma^2 = \sum_{i=1}^n \frac{(x_i - \bar{x})^2}{n} be the sample variance. It turns out that this is not an unbiased estimator of the actual variance of X, it is expected to be slightly lower than the actual variance of X. This is because the sample mean is unlikely to be the actual mean, so the data is likely to be less spread about the sample mean (which is what the sample variance calculates) than about the actual mean.

    However it turns out that:

     \displaystyle s^2 = \frac{n}{n-1} \sigma^2 =  \sum_{i=1}^n \frac{(x_i - \bar{x})^2}{n-1}

    is an unbiased estimator of the variance of X (ie  E(S^2) = V ).

    Therefore if you have a sample x_1, x_2 , ... , x_n of values from X you should:

    1) Use the formula with (n-1) if you want to estimate the variance of X
    2) Use the formula with n if you want to calculate the sample variance.

    (As it happens  \sqrt{s^2} will not give an unbiased estimate for the standard deviation of X, since the square root function is non-linear. However, it is actually impossible to find an unbiased estimator for the standard deviation in the same way we did for the variance, so square rooting s^2 is about a good as we can do to estimate the standard deviation of X)
    • Thread Starter
    Offline

    14
    (Original post by metaltron)
    I'll talk about variance mainly, which is just the square of the standard deviation.

    I''ll first explain how you might derive a formula for the variance, which is designed to be a measure of how spread out your data is. Suppose we measure something n times, and we get values x_1, x_2, ... , x_n.
    Let x(bar) = (x_1 + x_2 + ... + x_n)/n.

    We want to work out how spread out the data is from the mean, so the quantities  x_i - \bar{x} will be important for 1 <= i <= n. So we could take the mean of these as a possible measure of spread (average difference between value and mean), however  \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x}) = 0 (Persuade yourself that this is true!) so this is a rubbish measure of spread!

    The easy fix is to take the average squared difference between each value and the sample mean and then we do get a measure of spread in that the higher the variance, the bigger the spread about the sample mean:

     \displaystyle \sigma^2 = \sum_{i=1}^n \frac{(x_i - \bar{x})^2}{n}

    Since we squared each value when we calculated the variance the units are all wrong, which is why the standard deviation is also useful since square root now gives you back the correct units.

    Now notice that:

     \displaystyle \sigma^2 = \sum_{i=1}^n \frac{(x_i - \bar{x})^2}{n} =  \sum_{i=1}^n \frac{x_i^2 - 2x_i \bar{x} + \bar{x}^2}{n} = \sum_{i=1}^n \frac{x_i^2}{n} - 2\bar{x} \sum_{i=1}^n ( \frac{x_i}{n} ) + \bar{x}^2 =  \sum_{i=1}^n \frac{x_i^2}{n} - \bar{x}^2

    So we have the alternative formula:

     \displaystyle \sigma^2 = \sum_{i=1}^n \frac{x_i^2}{n} - (\frac {\sum_{i=1}^n x_i}{n}) ^2

    So now suppose we have a random variable X. Then X will have a mean and a variance, call them  \mu and V. However, we don't know what the values of the mean and variance are. For example, suppose we are calculating average human height, then we don't know exactly what the mean/variance is, we can only get a rough idea by taking a sample of humans.

    So say we have a sample x_1, x_2 , ... , x_n of values from X. Then if we want to estimate  \mu it turns out that the sample mean  \displaystyle \bar{x} = \frac {\sum_{i=1}^n x_i}{n} is a good estimate (in that its expected value is equal to  \mu ) . We say that  \bar{x} is an unbiased estimator of the mean.

    Now let  \displaystyle \sigma^2 = \sum_{i=1}^n \frac{(x_i - \bar{x})^2}{n} be the sample variance. It turns out that this is not an unbiased estimator of the actual variance of X, it is expected to be slightly lower than the actual variance of X. This is because the sample mean is unlikely to be the actual mean, so the data is likely to be less spread about the sample mean (which is what the sample variance calculates) than about the actual mean.

    However it turns out that:

     \displaystyle s^2 = \frac{n}{n-1} \sigma^2 =  \sum_{i=1}^n \frac{(x_i - \bar{x})^2}{n-1}

    is an unbiased estimator of the variance of X (ie  E(S^2) = V ).

    Therefore if you have a sample x_1, x_2 , ... , x_n of values from X you should:

    1) Use the formula with (n-1) if you want to estimate the variance of X
    2) Use the formula with n if you want to calculate the sample variance.

    (As it happens  \sqrt{s^2} will not give an unbiased estimate for the standard deviation of X, since the square root function is non-linear. However, it is actually impossible to find an unbiased estimator for the standard deviation in the same way we did for the variance, so square rooting s^2 is about a good as we can do to estimate the standard deviation of X)
    Thank you so much for this explanation. Really well explained, this clears up pretty much all the confusion I had and more!
 
 
 
Reply
Submit reply
Turn on thread page Beta
Updated: September 7, 2015

University open days

  1. University of Bradford
    University-wide Postgraduate
    Wed, 25 Jul '18
  2. University of Buckingham
    Psychology Taster Tutorial Undergraduate
    Wed, 25 Jul '18
  3. Bournemouth University
    Clearing Campus Visit Undergraduate
    Wed, 1 Aug '18
Poll
How are you feeling in the run-up to Results Day 2018?
Useful resources

Make your revision easier

Maths

Maths Forum posting guidelines

Not sure where to post? Read the updated guidelines here

Equations

How to use LaTex

Writing equations the easy way

Student revising

Study habits of A* students

Top tips from students who have already aced their exams

Study Planner

Create your own Study Planner

Never miss a deadline again

Polling station sign

Thinking about a maths degree?

Chat with other maths applicants

Can you help? Study help unanswered threads

Groups associated with this forum:

View associated groups

The Student Room, Get Revising and Marked by Teachers are trading names of The Student Room Group Ltd.

Register Number: 04666380 (England and Wales), VAT No. 806 8067 22 Registered Office: International House, Queens Road, Brighton, BN1 3XE

Write a reply...
Reply
Hide
Reputation gems: You get these gems as you gain rep from other members for making good contributions and giving helpful advice.