The Student Room Group

What is variance? (A Level Maths)

So, I know that variance is the square of the standard deviation, but I can't really find an example of when or how you would use it.
Original post by Zuvio
So, I know that variance is the square of the standard deviation, but I can't really find an example of when or how you would use it.


The standard deviation and variance are two different mathematical concepts that are both closely related. The variance is needed to calculate the standard deviation. These numbers help traders and investors determine the volatility of an investment and therefore allows them to make educated trading decisions. here you go
Reply 3
Original post by jademowry123
The standard deviation and variance are two different mathematical concepts that are both closely related. The variance is needed to calculate the standard deviation. These numbers help traders and investors determine the volatility of an investment and therefore allows them to make educated trading decisions. here you go

Yeah, but as far as I can see variance is just something you figure out on the way to finding SD. I know SD is used to identify outliers, but what does the variance do itself?
Original post by Zuvio
Yeah, but as far as I can see variance is just something you figure out on the way to finding SD. I know SD is used to identify outliers, but what does the variance do itself?

here you go found it
The variance is the average of the squared differences from the mean. To figure out the variance, first calculate the difference between each point and the mean; then, square and average the results. For example, if a group of numbers ranges from 1 to 10, it will have a mean of 5.5.
Original post by Zuvio
Yeah, but as far as I can see variance is just something you figure out on the way to finding SD. I know SD is used to identify outliers, but what does the variance do itself?


Statistical measure of how far values spread from their average
Original post by jademowry123
here you go found it
The variance is the average of the squared differences from the mean. To figure out the variance, first calculate the difference between each point and the mean; then, square and average the results. For example, if a group of numbers ranges from 1 to 10, it will have a mean of 5.5.



They’re asking for the reason/ logic behind it, not the process.
Original post by CaptainDuckie
They’re asking for the reason/ logic behind it, not the process.

i dont know sorry for wasting your tiem
Variance tells us that how much data is spread out from the average or mean value. The more spread the data, the larger the variance and vice versa.
Yeah I'd think about it as the SD as being the square root of the Variance... cos you usually calculate the variance first and then square root that to get the SD.

e.g. if you were weighing bags of crisps coming off a production line and ideally you'd want them all to be exactly the same weight - the variance and the SD are both ways of putting a number on how good or bad your process is at producing equal weight bags of crisps. the variance would be in units of g2 and while you know that a big variance is worse than a small variance, it's not immediately clear what g2 actually means... and the SD turns it back into the unit you measured in the first place.
They're (variance and standard deviation) obviously equivalent as one is the square / square root of the other. Standard deviation is more natural to interpret the width of the distribution as it's the same scale as the mean so you say things like
95% of the distribution lies in the interval [mu-2sig, mu+2sig]
For mean mu and standard deviation sig. You use standard deviation to get the z value, by scaling the difference from the mean.

Using variance means it is easier to derive theory/recursive algorithms/generalise to more than one variable etc. It represents the (negated, inverse) curvature parameter (quadratic "a" coefficient) in the normal distribution.

So they're basically the same. Standard deviation is easier to interpret statistically and relate to the data. Variance is more convenient for mathematical analysis.
variance can be analysed using tools such as ANOVA... you don't come across ANOSDE do you ? :beard:
Original post by Zuvio
Yeah, but as far as I can see variance is just something you figure out on the way to finding SD. I know SD is used to identify outliers, but what does the variance do itself?

This is a neat question. The variance of a probability distribution is the second moment of the distribution about the mean. (See Wikipedia). It turns out that most of the probability distributions that are used in real life are characterized by the sequence of their moments - the mean is the first, the variance is the second, then you get skewness and kurtosis, and then we run out of names for them. The first moment, the mean, tells you where the "center" of the distribution is, the variance tells you how spread out around the mean it is, the skewness tells you how asymmetric the tails of the distribution are, the kurtosis tells you how heavy the tails of the distribution are relative to the center of the distribution.

But why does the second moment get so much attention? It's related to the central limit theorem: once you start looking at averages of most random variables, you end up with something that is approximately normally distributed - and the normal distribution is entirely characterized by the values of its first and second moments - the mean and variance. You don't need any more information about it once you know these. So, the mean and the variance are stuffed full of information!
Reply 13
Original post by Zuvio
So, I know that variance is the square of the standard deviation, but I can't really find an example of when or how you would use it.


Removing the complexities, the bottom line is that the standard deviation is useful for interpreting data whereas the variance is more useful for calculations.

I suppose we could, when performing calculations, refer to it as 'the square of the standard deviation'; yet since it's so important, it gets its own name variance.

EDIT: It's much more nuanced than that, but that simplification gets me through A-Level stats sufficiently well
(edited 3 years ago)
Original post by Zuvio
Yeah, but as far as I can see variance is just something you figure out on the way to finding SD. I know SD is used to identify outliers, but what does the variance do itself?

There's a key property of the variance that makes it much more convenient to use than the standard deviation, even though one is simply a function of the other: there's a nice simple formulae for estimating a population variance from a sample drawn from that population - you should know it well. That estimator has the vital property of being "unbiased" - that is, it's expected value is equal to the population value that it is estimating.

This is not true of the "obvious" estimator of the standard deviation (taken by using the square root of the usual variance estimator) - surprisingly enough, this estimator is biased. So, all in all, the variance is a "simpler" quantity to estimate and to use in statistics than the standard deviation.
Original post by daan.m
Removing the complexities, the bottom line is that the standard deviation is useful for interpreting data whereas the variance is more useful for calculations.

I think you might need to spell out a little more why this is so. The major point in favour of the standard deviation is that it is on the same scale as the data itself. You take some observation values, you subtract them from a constant, you square them, you sum them, and then you take the square root. So, if you do this with measurements that have units - kilograms say, then the standard deviation will also be in the units of kilograms, whereas the variance will be in units of kilograms-squared.
Reply 16
Original post by Gregorius
There's a key property of the variance that makes it much more convenient to use than the standard deviation, even though one is simply a function of the other: there's a nice simple formulae for estimating a population variance from a sample drawn from that population - you should know it well. That estimator has the vital property of being "unbiased" - that is, it's expected value is equal to the population value that it is estimating.

This is not true of the "obvious" estimator of the standard deviation (taken by using the square root of the usual variance estimator) - surprisingly enough, this estimator is biased. So, all in all, the variance is a "simpler" quantity to estimate and to use in statistics than the standard deviation.

This is something that has bugged me for a good few years as a non-statistics-specialist!

If the sample variance using n-1 is a "good" (in the sense of being unbiased) estimator for the population variance, then why doesn't taking the square root simply give you an "equally good" (i.e. unbiased) estimator of the SD? It seems counterintuitive that if I've got the best possible estimate for some population parameter S^2, then taking the square root of my estimate shouldn't also give me the best possible estimate for S. It's a bit like saying if I have a SUVAT formula that gives me v^2 in terms of other variables, I can't then use it to get v because there might be an error involved :smile:

In the "real world" it's the SD that interests us, as it has the same units as the original data measurements and intuitively represents the spread-out-ness of the data, whereas the variance feels more theoretical. Is it possible to construct an unbiased estimator of the SD from sample data, or is this a non-trivial problem?
Reply 17
Original post by Gregorius
I think you might need to spell out a little more why this is so. The major point in favour of the standard deviation is that it is on the same scale as the data itself. You take some observation values, you subtract them from a constant, you square them, you sum them, and then you take the square root. So, if you do this with measurements that have units - kilograms say, then the standard deviation will also be in the units of kilograms, whereas the variance will be in units of kilograms-squared.

Yeah that's so true.

The reason I didn't go into more detail because to be honest the underlying principles are a bit of a grey area for me too I have A-Level Further Stats knowledge but no more.

I left the fine detail for you (as you clearly know your stuff :smile: ) and instead shared the simplified 'summary' that I like to keep in my head when working through problems.
Original post by davros

If the sample variance using n-1 is a "good" (in the sense of being unbiased) estimator for the population variance, then why doesn't taking the square root simply give you an "equally good" (i.e. unbiased) estimator of the SD? It seems counterintuitive that if I've got the best possible estimate for some population parameter S^2, then taking the square root of my estimate shouldn't also give me the best possible estimate for S.


It boils down to the fact that the expectation operator does not commute with the square root function. If you simply write down the formula for the expectation of the sample sample standard deviation, it's a bit of a mess involving gamma functions!

One other way of looking at this problems is that one should not, perhaps, over-emphasize the usefulness of unbiasedness - after all, it's only one criterion of "goodness" for an operator. In machine learning, for example, one is often much more interested in something like the root mean square error of an estimator - and you then get into the realms of what's called the "bias-variance tradeoff" - it can happen that an biased estimator can have lower variance (and thus lower total mean squared error) than an unbiased estimator. Think of comparing a tight group of arrows just away from the center of a target with a loose group of arrows centered on the target: which is better? Well, it depends!


In the "real world" it's the SD that interests us, as it has the same units as the original data measurements and intuitively represents the spread-out-ness of the data, whereas the variance feels more theoretical. Is it possible to construct an unbiased estimator of the SD from sample data, or is this a non-trivial problem?


Yes, but it's a formula that varies from distribution to distribution. See here in Wikipedia, for example.
Reply 19
Original post by Gregorius
It boils down to the fact that the expectation operator does not commute with the square root function. If you simply write down the formula for the expectation of the sample sample standard deviation, it's a bit of a mess involving gamma functions!

...


Yes, but it's a formula that varies from distribution to distribution. See here in Wikipedia, for example.

Thanks - most informative :smile:

Quick Reply