The Student Room Group

Confused about variance :s

I have a sample, and in relation to that sample it says

"observations of the error term are drawn from a distribution that has a constant variance"

that's what i don't get..surely the errors are completely random..why do they always have to come from a bell shaped curve?

Is it because regression tries to keep the errors as low as possible, so therefore the errors are distributed in a bell shape, with half a bell covering each axis?



Therefore the peak of the bell is at zero, which means the majority of errors are 0, then a few less are -1 and 1...few less then that are -2 and 2...and very few are -10 and 10? (etc) Is this correct

Even so, i still don't get how that answers the statement in bold - how does that bell shape, which is the distribution in question (i think), imply a constant variance? (which is what the statement in bold says it does)

thanks
Reply 1
it is saying the errors come from a normal distribution.

'completely random' is a vague term - what do you mean by it? a uniform distribution over all the reals? surely that's far more unlikely?

most statistics assumes that the errors come from a normal distribution. i.e. for every measurement, the error ~N(0,s^2). there's no reason why s would change with each measurement.
Reply 2
Chewwy
it is saying the errors come from a normal distribution.

'completely random' is a vague term - what do you mean by it? a uniform distribution over all the reals? surely that's far more unlikely?

most statistics assumes that the errors come from a normal distribution. i.e. for every measurement, the error ~N(0,s^2). there's no reason why s would change with each measurement.


but WHY do they come from a normal distribution? why is it not possible that for example, you have 200 observations, and 180 of them are high errors and only 20 of them are low (therefore not bell shaped?) Why is it always that the majority of errors are zero, then a few less are -1 and 1..few less then that -2 and 2.. etc?

if the explanation is too complicated i will accept a 'it just IS that way :smile:' I think i read somewhere once that distributions resemble a normal distribution if N is more then 30 so is it something to do with that?
redkopite
Why is it always that the majority of errors are zero, then a few less are -1 and 1..few less then that -2 and 2.. etc?


That's what a normal distribution is defined as. You're thinking that "they have to follow a normal distribution" when in reality it's more like "it happens to look like what we've defined as a normal distribution".

That's why as N increases, it approximates the distribution better because you have more of what you expect and less outliers.
Reply 4
redkopite
but WHY do they come from a normal distribution? why is it not possible that for example, you have 200 observations, and 180 of them are high errors and only 20 of them are low (therefore not bell shaped?) Why is it always that the majority of errors are zero, then a few less are -1 and 1..few less then that -2 and 2.. etc?

if the explanation is too complicated i will accept a 'it just IS that way :smile:' I think i read somewhere once that distributions resemble a normal distribution if N is more then 30 so is it something to do with that?


The only reason you would assume the random errors are normal is for simplicity. They are doing you a favor.

If errors are normal (i.e. Ei ~N(0,σ^2), then in your simple regression model: Yi = a + bx + Ei,
Yi ~ N(a + bx, σ^2) and Y1...Yn are independent. Otherwise life is much more complicated,

Latest