# Question on population regression model

Watch
Announcements
#1
Hi, after doing some research and reading the textbook I still don't fully get the terms used in the regression topic. There is a difference between population regression model and sample population model. But since regression is run for a set of sample, surely you can't describe the scatter plot generated as the population regression model?

Thanks
0
2 years ago
#2
In a nutshell, the population regression model is the theoretical model which you assert your data has i.e. Y = g(X) + e, where e is some statistical error term. The sample regression model is the estimation of this model which you produce from your data using least-squares estimators.
0
#3
(Original post by VannR)
In a nutshell, the population regression model is the theoretical model which you assert your data has i.e. Y = g(X) + e, where e is some statistical error term. The sample regression model is the estimation of this model which you produce from your data using least-squares estimators.
So if you run regression for two sets of data in linear regression. You call the scatter plot you construct the population regression model since it shows the line with the equation Y = g(X) + e?

The regression statistics generated, which includes value for R squared and SE are described as the sample regression model?
0
2 years ago
#4
(Original post by coconut64)
So if you run regression for two sets of data in linear regression. You call the scatter plot you construct the population regression model since it shows the line with the equation Y = g(X) + e?

The regression statistics generated, which includes value for R squared and SE are described as the sample regression model?
The EQUATION that you get is called the sample regression model.

Everything else you have is a set of statistics generated from the sample regression model which can be used to check the quality of the regression.

EDIT: Y = g(X) + e is a description of the population model. We cannot know what the population model is for certain unless we have the entire population, which we don't. That is why we're using least-squares estimators. The term e is a normally distributed random variable with a mean of 0 and a variance of 1.
0
#5
(Original post by VannR)
The EQUATION that you get is called the sample regression model.

Everything else you have is a set of statistics generated from the sample regression model which can be used to check the quality of the regression.

EDIT: Y = g(X) + e is a description of the population model. We cannot know what the population model is for certain unless we have the entire population, which we don't. That is why we're using least-squares estimators. The term e is a normally distributed random variable with a mean of 0 and a variance of 1.
Oh okay, thank you! So the error term is only present in the population regression equation? Why is ei not in the sample regression equation?

Thanks
0
2 years ago
#6
(Original post by coconut64)
Oh okay, thank you! So the error term is only present in the population regression equation? Why is ei not in the sample regression equation?

Thanks
Here is the population regression model for regression on a single explanatory variable

Y = a + bx + e

e ~ N(0, (sigma)^2). Finding the expectation and variance of Y for a fixed value x:

E(Y) = E(a + bx + e) = a + bx + E(e) = a + bx
Var(Y) = Var(a + bx + e) = Var(e) = (sigma)^2

So, the population regression model models Y ~ N(a + bx, (sigma)^2)

The problem now is that we do not know what a and b are. Thus, we use least-squares estimators a^, b^ such that E(a^) = a and E(b^) = b.

The sample regression model is then an estimator of Y, Y^, for a given value of x, such that Y^ = a^ + b^ . x, and where if we take expectations:

E(Y^) = E(a^ + b^ . x) = E(a^) + E(b^ . x) = a + b.x

Since E(Y^) = Y, we then have our original model without the error:

Y = a + b.x

P.S. "Where has the error gone?" - the regression model is not perfect! We think that the error is likely to be 0, and our model is based on the assertion that it is. We might be dreadfully wrong about this though. This is why regression analysis needs a lot of "quality controls" before it can be used for real-world inferences.

P.P.S. I'm studying mathematics at university, hence all the details. I'm not sure of your level, so if you need anything more explained just message
1
X

new posts
Back
to top
Latest
My Feed

### Oops, nobody has postedin the last few hours.

Why not re-start the conversation?

see more

### See more of what you like onThe Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

### Poll

Join the discussion

#### Do you have the space and resources you need to succeed in home learning?

Yes I have everything I need (254)
57.21%
I don't have everything I need (190)
42.79%