The Student Room Group

Question on population regression model

Hi, after doing some research and reading the textbook I still don't fully get the terms used in the regression topic. There is a difference between population regression model and sample population model. But since regression is run for a set of sample, surely you can't describe the scatter plot generated as the population regression model?

Thanks
Reply 1
In a nutshell, the population regression model is the theoretical model which you assert your data has i.e. Y = g(X) + e, where e is some statistical error term. The sample regression model is the estimation of this model which you produce from your data using least-squares estimators.
Reply 2
Original post by VannR
In a nutshell, the population regression model is the theoretical model which you assert your data has i.e. Y = g(X) + e, where e is some statistical error term. The sample regression model is the estimation of this model which you produce from your data using least-squares estimators.


So if you run regression for two sets of data in linear regression. You call the scatter plot you construct the population regression model since it shows the line with the equation Y = g(X) + e?

The regression statistics generated, which includes value for R squared and SE are described as the sample regression model?
Reply 3
Original post by coconut64
So if you run regression for two sets of data in linear regression. You call the scatter plot you construct the population regression model since it shows the line with the equation Y = g(X) + e?

The regression statistics generated, which includes value for R squared and SE are described as the sample regression model?


The EQUATION that you get is called the sample regression model.

Everything else you have is a set of statistics generated from the sample regression model which can be used to check the quality of the regression.

EDIT: Y = g(X) + e is a description of the population model. We cannot know what the population model is for certain unless we have the entire population, which we don't. That is why we're using least-squares estimators. The term e is a normally distributed random variable with a mean of 0 and a variance of 1.
(edited 6 years ago)
Reply 4
Original post by VannR
The EQUATION that you get is called the sample regression model.

Everything else you have is a set of statistics generated from the sample regression model which can be used to check the quality of the regression.

EDIT: Y = g(X) + e is a description of the population model. We cannot know what the population model is for certain unless we have the entire population, which we don't. That is why we're using least-squares estimators. The term e is a normally distributed random variable with a mean of 0 and a variance of 1.


Oh okay, thank you! So the error term is only present in the population regression equation? Why is ei not in the sample regression equation?

Thanks
Reply 5
Original post by coconut64
Oh okay, thank you! So the error term is only present in the population regression equation? Why is ei not in the sample regression equation?

Thanks


Here is the population regression model for regression on a single explanatory variable

Y = a + bx + e

e ~ N(0, (sigma)^2). Finding the expectation and variance of Y for a fixed value x:

E(Y) = E(a + bx + e) = a + bx + E(e) = a + bx
Var(Y) = Var(a + bx + e) = Var(e) = (sigma)^2

So, the population regression model models Y ~ N(a + bx, (sigma)^2)

The problem now is that we do not know what a and b are. Thus, we use least-squares estimators a^, b^ such that E(a^) = a and E(b^) = b.

The sample regression model is then an estimator of Y, Y^, for a given value of x, such that Y^ = a^ + b^ . x, and where if we take expectations:

E(Y^) = E(a^ + b^ . x) = E(a^) + E(b^ . x) = a + b.x

Since E(Y^) = Y, we then have our original model without the error:

Y = a + b.x

P.S. "Where has the error gone?" - the regression model is not perfect! We think that the error is likely to be 0, and our model is based on the assertion that it is. We might be dreadfully wrong about this though. This is why regression analysis needs a lot of "quality controls" before it can be used for real-world inferences.

P.P.S. I'm studying mathematics at university, hence all the details. I'm not sure of your level, so if you need anything more explained just message :smile:
(edited 6 years ago)

Quick Reply

Latest