The Student Room Group

Misspecification test interpretation help!

https://imgur.com/a/d24GX2O

I am running a misspecification test on PcGive for my linear regression equation which 10 independent variables. However I don't understand the difference between RESET test and RESET23 test. From my research I found that reset23 is better for a larger sample because it considers the squares and cubes... My sample is 120, so this doesn't seem large, do I just select the reset test result and ignore the results for RESET23? I also don't understand what F(2,114) means. 2 is the signficiant alpha value but shouldn't the second value be n-k-1? So 120-10-1=109. Why 114?

Also what is the difference between hetero and hetero-x?

Thanks
Original post by coconut64
https://imgur.com/a/d24GX2O

I am running a misspecification test on PcGive for my linear regression equation which 10 independent variables. However I don't understand the difference between RESET test and RESET23 test.


The Ramsey RESET test looks to see if a model with linear and non-linear terms in the explanatory variables fits better than a model with just the linear terms. In the implementation in PcGive, the RESET test appears to add quadratic terms in the linear-regression-fitted-values to the linear terms, and the RESET23 test adds quadratic and cubic terms in the fitted values. (So this is slightly different to the classical Ramsey tests, which uses powers of the explanatory variables rather than fitted values).


From my research I found that reset23 is better for a larger sample because it considers the squares and cubes... My sample is 120, so this doesn't seem large, do I just select the reset test result and ignore the results for RESET23?


Your sample size is actually quite small for a linear regression with ten explanatory variables. There are various rules of thumb, but I prefer that there should be at least 15 observations per explanatory variable to avoid overfitting. You've actually got less than this - but enough to satisfy those who have the rule of thumb starting at 10 observations per explanatory variable!


I also don't understand what F(2,114) means. 2 is the signficiant alpha value but shouldn't the second value be n-k-1? So 120-10-1=109. Why 114?


This is where I can't see that your description of 10 explanatory variables fits with the printed output. "Number of parameters" appears to be equal to 4, and an F-test F(2, 114) corresponds to the difference in number of degrees of freedom between the non-linear and the linear model being 2, and the number of degrees of freedom of the non-linear model being 120-114 = 6. Looks like you have 4 explanatory variables, and the quadratic and cubic terms add another 2 to make 6.


Also what is the difference between hetero and hetero-x?
Thanks


Two different versions of the White test for heteroscedasticity. The "X" version appears to include interaction terms when there is more than one explanatory variable.
Reply 2
Original post by Gregorius
The Ramsey RESET test looks to see if a model with linear and non-linear terms in the explanatory variables fits better than a model with just the linear terms. In the implementation in PcGive, the RESET test appears to add quadratic terms in the linear-regression-fitted-values to the linear terms, and the RESET23 test adds quadratic and cubic terms in the fitted values. (So this is slightly different to the classical Ramsey tests, which uses powers of the explanatory variables rather than fitted values).



Your sample size is actually quite small for a linear regression with ten explanatory variables. There are various rules of thumb, but I prefer that there should be at least 15 observations per explanatory variable to avoid overfitting. You've actually got less than this - but enough to satisfy those who have the rule of thumb starting at 10 observations per explanatory variable!



This is where I can't see that your description of 10 explanatory variables fits with the printed output. "Number of parameters" appears to be equal to 4, and an F-test F(2, 114) corresponds to the difference in number of degrees of freedom between the non-linear and the linear model being 2, and the number of degrees of freedom of the non-linear model being 120-114 = 6. Looks like you have 4 explanatory variables, and the quadratic and cubic terms add another 2 to make 6.



Two different versions of the White test for heteroscedasticity. The "X" version appears to include interaction terms when there is more than one explanatory variable.


Thank you so much for explaining this, it actually helps me a lot !

This is what I confused about now, I haven't studied stats for that long so I am still getting used to running tests on PcGive.
https://imgur.com/a/1jHQW0U => This is also the result I got.
And when I scroll down the result page, I got the picture which shows 4 independent variables only like you mentioned. I don't know if this is because 6 of the coefficients are shown to be insignificant; that's why it is removed.

This also raises another problem for me because there are two white tests values : two values of 'Hetero'. Which value should be considered to detect heteroskadasticity in the entire functional form then???

Many thanks
Original post by coconut64

This is what I confused about now, I haven't studied stats for that long so I am still getting used to running tests on PcGive.
https://imgur.com/a/1jHQW0U => This is also the result I got.
And when I scroll down the result page, I got the picture which shows 4 independent variables only like you mentioned. I don't know if this is because 6 of the coefficients are shown to be insignificant; that's why it is removed.


I'm not familiar with this software - but taking a good look at the output, what it appears to be doing is a tree-based search through the possible models that you can make out of subsets of the original 10 explanatory variables. It's not simply removing the non-significant explanatory variables from the original regression, but finding the "best" model by repeated model fitting. (A bit of an aside, but this sort of procedure is rather frowned on in statistical circles these days; but this is econometrics, so anything goes!)


This also raises another problem for me because there are two white tests values : two values of 'Hetero'. Which value should be considered to detect heteroskadasticity in the entire functional form then???


There are many ways of testing for heteroskadasticity, each of them looking at slightly different facets of the problem. A good general rule here is that if any one of the tests detects it, then there is a problem.
Reply 4
Oh okay. But if I absolutely have to pick a value from the heteroskedasticity test, should the hetero value for the 10 independent variables be chosen rather than choosing the hetero value that is derived from focusing on the sub-sets? I am interested in this because I am testing which whether my functional form overall is appropriate.
Original post by Gregorius
I'm not familiar with this software - but taking a good look at the output, what it appears to be doing is a tree-based search through the possible models that you can make out of subsets of the original 10 explanatory variables. It's not simply removing the non-significant explanatory variables from the original regression, but finding the "best" model by repeated model fitting. (A bit of an aside, but this sort of procedure is rather frowned on in statistical circles these days; but this is econometrics, so anything goes!)



There are many ways of testing for heteroskadasticity, each of them looking at slightly different facets of the problem. A good general rule here is that if any one of the tests detects it, then there is a problem.
Original post by coconut64
Oh okay. But if I absolutely have to pick a value from the heteroskedasticity test, should the hetero value for the 10 independent variables be chosen rather than choosing the hetero value that is derived from focusing on the sub-sets? I am interested in this because I am testing which whether my functional form overall is appropriate.


One always reports diagnostics on the final model; so here, the four variable model.

It's clear that this model is badly mis-specified, so your next step should be to look at the various diagnostic residual plots to see what should be done.
Reply 6
'One always reports diagnostics on the final model', I have never read this from my statistics book before where can I find more about this information?

Please could I ask one more question since you have suggested this approach.

I have run tests for a different functional form this time, and looked at the results for the final model like you suggested. => https://imgur.com/a/d24GX2O
However, the degrees of freedom for the white test changes as the functional form changes. This means I will need different critical values for each of the functional form test I do. But shouldn't the critical value for the white test be the same although I test for different functional forms ?
If I consider the first model with the inclusion of all 10 independent variables, the degrees of freedom for the white test doesn't change at all; so why should the results in the final test be considered?

Thanks a lot, I apologise for the endless questions...
Original post by Gregorius
One always reports diagnostics on the final model; so here, the four variable model.

It's clear that this model is badly mis-specified, so your next step should be to look at the various diagnostic residual plots to see what should be done.
(edited 4 years ago)
Original post by coconut64
'One always reports diagnostics on the final model', I have never read this from my statistics book before where can I find more about this information?


A statistical model is often (usually?) used to make inference about a particular scientific question. Therefore one of the fundamental requirements in writing up a modelling exercise is to assure the reader that the statistical model used will deliver valid inference. As the "final model" we're talking about here is the one used for inference, it is this that you have to report diagnostic information on. Or more particularly, you have to report that the final model diagnostics are all passed; if you have a model that fails diagnostic tests, then you can't make valid inference from it.

The model that you have arrived at here so far clearly has severe problems with mis-specification of some sort. Therefore you couldn't present it as a final model, and shouldn't use it for inference.


Please could I ask one more question since you have suggested this approach.

I have run tests for a different functional form this time, and looked at the results for the final model like you suggested. => https://imgur.com/a/d24GX2O


I'm confused; this looks like the same printout as before. In what way have you changed the functional form?


However, the degrees of freedom for the white test changes as the functional form changes.


Yes, you'd expect this. The White test regresses the squared residuals from the regression under consideration against the original explanatory variables and their quadratic and cross terms. If you change the number of explanatory variables in the original regression (by changing the functional form), then you'll change the number of variables that go into the White test.


This means I will need different critical values for each of the functional form test I do. But shouldn't the critical value for the white test be the same although I test for different functional forms ?


If you change the functional form between regressions, you're looking at different models; why should they share critical regions?


If I consider the first model with the inclusion of all 10 independent variables, the degrees of freedom for the white test doesn't change at all; so why should the results in the final test be considered?


I'm not sure what this means; could you elaborate?
Reply 8
Original post by Gregorius
A statistical model is often (usually?) used to make inference about a particular scientific question. Therefore one of the fundamental requirements in writing up a modelling exercise is to assure the reader that the statistical model used will deliver valid inference. As the "final model" we're talking about here is the one used for inference, it is this that you have to report diagnostic information on. Or more particularly, you have to report that the final model diagnostics are all passed; if you have a model that fails diagnostic tests, then you can't make valid inference from it.

The model that you have arrived at here so far clearly has severe problems with mis-specification of some sort. Therefore you couldn't present it as a final model, and shouldn't use it for inference.



I'm confused; this looks like the same printout as before. In what way have you changed the functional form?



Yes, you'd expect this. The White test regresses the squared residuals from the regression under consideration against the original explanatory variables and their quadratic and cross terms. If you change the number of explanatory variables in the original regression (by changing the functional form), then you'll change the number of variables that go into the White test.



If you change the functional form between regressions, you're looking at different models; why should they share critical regions?



I'm not sure what this means; could you elaborate?


Thank you for your help, this really clarifies things !

Quick Reply

Latest