Question on population regression model
Watch
Announcements
Page 1 of 1
Skip to page:
Hi, after doing some research and reading the textbook I still don't fully get the terms used in the regression topic. There is a difference between population regression model and sample population model. But since regression is run for a set of sample, surely you can't describe the scatter plot generated as the population regression model?
Thanks
Thanks
0
reply
Report
#2
In a nutshell, the population regression model is the theoretical model which you assert your data has i.e. Y = g(X) + e, where e is some statistical error term. The sample regression model is the estimation of this model which you produce from your data using least-squares estimators.
0
reply
(Original post by VannR)
In a nutshell, the population regression model is the theoretical model which you assert your data has i.e. Y = g(X) + e, where e is some statistical error term. The sample regression model is the estimation of this model which you produce from your data using least-squares estimators.
In a nutshell, the population regression model is the theoretical model which you assert your data has i.e. Y = g(X) + e, where e is some statistical error term. The sample regression model is the estimation of this model which you produce from your data using least-squares estimators.
The regression statistics generated, which includes value for R squared and SE are described as the sample regression model?
0
reply
Report
#4
(Original post by coconut64)
So if you run regression for two sets of data in linear regression. You call the scatter plot you construct the population regression model since it shows the line with the equation Y = g(X) + e?
The regression statistics generated, which includes value for R squared and SE are described as the sample regression model?
So if you run regression for two sets of data in linear regression. You call the scatter plot you construct the population regression model since it shows the line with the equation Y = g(X) + e?
The regression statistics generated, which includes value for R squared and SE are described as the sample regression model?
Everything else you have is a set of statistics generated from the sample regression model which can be used to check the quality of the regression.
EDIT: Y = g(X) + e is a description of the population model. We cannot know what the population model is for certain unless we have the entire population, which we don't. That is why we're using least-squares estimators. The term e is a normally distributed random variable with a mean of 0 and a variance of 1.
0
reply
(Original post by VannR)
The EQUATION that you get is called the sample regression model.
Everything else you have is a set of statistics generated from the sample regression model which can be used to check the quality of the regression.
EDIT: Y = g(X) + e is a description of the population model. We cannot know what the population model is for certain unless we have the entire population, which we don't. That is why we're using least-squares estimators. The term e is a normally distributed random variable with a mean of 0 and a variance of 1.
The EQUATION that you get is called the sample regression model.
Everything else you have is a set of statistics generated from the sample regression model which can be used to check the quality of the regression.
EDIT: Y = g(X) + e is a description of the population model. We cannot know what the population model is for certain unless we have the entire population, which we don't. That is why we're using least-squares estimators. The term e is a normally distributed random variable with a mean of 0 and a variance of 1.
Thanks
0
reply
Report
#6
(Original post by coconut64)
Oh okay, thank you! So the error term is only present in the population regression equation? Why is ei not in the sample regression equation?
Thanks
Oh okay, thank you! So the error term is only present in the population regression equation? Why is ei not in the sample regression equation?
Thanks
Y = a + bx + e
e ~ N(0, (sigma)^2). Finding the expectation and variance of Y for a fixed value x:
E(Y) = E(a + bx + e) = a + bx + E(e) = a + bx
Var(Y) = Var(a + bx + e) = Var(e) = (sigma)^2
So, the population regression model models Y ~ N(a + bx, (sigma)^2)
The problem now is that we do not know what a and b are. Thus, we use least-squares estimators a^, b^ such that E(a^) = a and E(b^) = b.
The sample regression model is then an estimator of Y, Y^, for a given value of x, such that Y^ = a^ + b^ . x, and where if we take expectations:
E(Y^) = E(a^ + b^ . x) = E(a^) + E(b^ . x) = a + b.x
Since E(Y^) = Y, we then have our original model without the error:
Y = a + b.x
P.S. "Where has the error gone?" - the regression model is not perfect! We think that the error is likely to be 0, and our model is based on the assertion that it is. We might be dreadfully wrong about this though. This is why regression analysis needs a lot of "quality controls" before it can be used for real-world inferences.
P.P.S. I'm studying mathematics at university, hence all the details. I'm not sure of your level, so if you need anything more explained just message

1
reply
X
Page 1 of 1
Skip to page:
Quick Reply
Back
to top
to top