The Student Room Group

How do I find statistical significance with two sets of data (regression analysis)?

Scroll to see replies

Original post by sleepysnooze

okay I'm sorry if I've gone off from my original idea here but now I am wanting to test if a change of an institution will cause significant changes in the results of elections. I have 4 results sets: (1= the first country and 2 = the second) 1a) before the institutional change, and 1b) after the institutional change. and I have 2a) a similar system without institutional change, and 2b) after 1's institutional change and still with no institutional change (so I'm comparing how two similar systems can compare to each other when one gets a change of its electoral institution, basically). what should I do to prove that the institution was the determining factor of the change of results? I have found that there was a change for 1 but no change for 2 in terms of the average results over the time periods (the "effective number of political parties" increased by 1.5~, for instance) but how do I show that it wasn't a coincidence?


So if I have understood you correctly, you have a change of institution in country 1, but no change of institution in country 2, and you want to test whether the change in institution in country 1 has an effect in either/both of country 1 and country 2. Is that right?

The most straight forward way to do this is to create one data set with an extra column (with values 0 or 1) that specifies whether the observation was taken before (0) or after (1) the change of institution in country 1. Let's call it "change"

You then do a regression with margin of victory as outcome and with a full factorial interaction between "voter turnout" and "change" - that is, you have "voter turnout" and "change" and "voter turnout" times "change" in the regression equation.

Now this is getting pretty technical, and there really are a number of things about the data that need checking in order to get correct answers - I teach this sort of stuff to non-statistician graduate students and they often struggle with it!

If you're going to attempt to do an analysis this sophisticated, I strongly urge you to get local help from a statistician!


I'm sorry, you sound extremely knowledgable about statistical analysis and this would help me beyond words if you could guide me


A pleasure - this is what I do for a living.
Original post by sleepysnooze
also:

SUMMARY OUTPUT Regression Statistics Multiple R 0.022903566 R Square 0.000524573 Adjusted R Square -0.001017827 Standard Error 7.590635054 Observations 650 ANOVA df SS MS F Significance F Regression 1 19.59590527 19.59591 0.340101939 0.559973287 Residual 648 37336.29585 57.61774 Total 649 37355.89176 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 51.76244313 3.355788785 15.42482 5.93377E-46 45.17291013 58.35197612 45.17291 58.35198 X Variable 1 -2.954555179 5.066260901 -0.58318 0.559973287 -12.90282532 6.993714963 -12.9028 6.993715
bGwDpQJrNBb8dJvz2c1qw9X9Kg+tgKp7cdkPhvzGNfR2hWodjKxO4DBfCBWwIeOgyejpNf2EK89/sX
(woops, I thought it would appear as a table)

I just inserted my X data and Y data (X being majority of a candidate and Y being the turnout in an election) - if the confidence level is 95% (I don't even know what this implies, being a stats novice) and the significance F is 0.55997~ does that mena that the relationship between the two variables is significant? my guess is yes but this is quite important work I'm doing here so if you could explain it that would be absolutely fantastic


It's a bit difficult to be sure I'm picking out the right bits (perhaps post a screenshot next time?) but it looks to me that you have an R-squared of 0.000524573 (which, by taking the square root gives us a correlation coefficient of about 0.023, which is tiny) and a p-value of about 0.56 - which is above the 0.05 threshold, which would be taken as "not statistically significant". In other words there is no relation between turnout and margin of victory.

BUT - you must plot this stuff, as regression (and use of correlation coefficients) makes some assumptions that we need to check before we can declare this analysis valid.
Original post by sleepysnooze
Hey guys, I am having trouble understanding understanding how to find the coefficient(?) for statistical significance between two variables. I am trying to judge whether in elections the turnout rate positively correlates with the closesness of victory within it (just the example of the current data I have right now). I have been going through videos where they talk about things like "t tests" and "alpha levels"(?). what I understand is that you need to test two variables together and then you will get a coefficient. but I do not understand how to get that coefficient (i.e. in excel?). from that coefficient, how do I then work out statistical significance? in the data that I've read before they talk about "significant at the _% level" - how do I know whether something would be significant at a certain percentage? is this something I have to input before the calculations? I'm very sorry guys, I am no mathematician - I am in desperate need of help here


So on one axis you have turnout and on the other you have a measure of how "close" the election was? And you want to use linear regression I assume? Of the form y=α+βx+ϵy = \alpha + \beta x + \epsilon ? *Leaving out the index i

If so you can estimate beta with OLS, ordinary least squares, the estimator is given by: β^=Cov(x,y)Var(x) \hat{\beta} = \frac{Cov(x,y)}{Var(x)} while alpha is found by α^=yˉβ^xˉ\hat{\alpha} = \bar{y} - \hat{\beta}\bar{x}.

You can then proceed to do statistical significance on the hypothesis that beta is 0. Your test statistic will then be β^SE(β^)\frac{\hat{\beta}}{SE(\hat{\beta})}, where SE is the standard error (google it, too lazy to type out the formula in latex) which under normality will be t-distributed. You can then look up values of the t-distribution in tables and you can see where the null hypothesis will be rejected or not rejected.
(edited 7 years ago)
Original post by Gregorius
So if I have understood you correctly, you have a change of institution in country 1, but no change of institution in country 2, and you want to test whether the change in institution in country 1 has an effect in either/both of country 1 and country 2. Is that right?

yes but it's got nothing to do with voter turnout or margin of victory now - my variable here is "effective number of parties", or possibly "the index of disproportionality". I have sets of results regarding either of those, but I don't know how to indicate that the change in the institution (i.e. I have 7 data sets for before and 7 for after the institution-change country, and 7 each before and after in the country that had no institution change) was the reason for the change in the results (and looking from my results, there *was* a noticeable change - but I don't know how to prove the significance of the change). so it's not the turnout or margin of victory. if you could explain what do to do regarding this change of data that I now have that would be fantastic - you have been amazingly helpful too in the previous responses. from what I gather, I can plot a scatter graph to see initially if the corellation is close to 1 or not (1 = a perfect correlation, right?) or I can do regression and then see if the coefficient is low (i.e. below 0.05) right?
Original post by sleepysnooze
yes but it's got nothing to do with voter turnout or margin of victory now - my variable here is "effective number of parties", or possibly "the index of disproportionality". I have sets of results regarding either of those, but I don't know how to indicate that the change in the institution (i.e. I have 7 data sets for before and 7 for after the institution-change country, and 7 each before and after in the country that had no institution change) was the reason for the change in the results (and looking from my results, there *was* a noticeable change - but I don't know how to prove the significance of the change). so it's not the turnout or margin of victory. if you could explain what do to do regarding this change of data that I now have that would be fantastic - you have been amazingly helpful too in the previous responses. from what I gather, I can plot a scatter graph to see initially if the corellation is close to 1 or not (1 = a perfect correlation, right?) or I can do regression and then see if the coefficient is low (i.e. below 0.05) right?


I must admit, I'm now not sure what you're trying to do from your description. I am fairly sure that you need to use regression with a suitable interaction term - but I'll say this again - you're now in territory where it's very easy to go wrong and you could really do with some local statistical assistance. The devil is in the detail!
Original post by Gregorius
I must admit, I'm now not sure what you're trying to do from your description. I am fairly sure that you need to use regression with a suitable interaction term - but I'll say this again - you're now in territory where it's very easy to go wrong and you could really do with some local statistical assistance. The devil is in the detail!


sorry I'll explain it a bit better:
my test here is to determine whether or not a change, specifically, to an electoral system will cause either a) an increase in the criterion of "effective number of (political) parties", or b) an increase in disproportionality. in order to test this, I have 4 sets of data, using 2 nations. the first nation engaged in this electoral reform and I have the data of 1) the elections in that country without electoral reform, and 2) the election in that country *with* electoral reform. then, to compare, I have a very similar country and I have data that is within the same time period of that other country that did engage in electoral reform. I have the before and after data for this other country contrasting with the before and after data of the other nation that engaged in electoral reform. because they both used the same electoral system before, but now one uses another electoral system, I am predicting that the electoral system, because of the increase in the number of political parties and the decrease in disproportionality in response to this change of electoral institution, I am simply wanting to show that the change was significant and the change was not simply based on coincidences.

but if you don't know what I'm getting at, don't worry - you've proven very helpful already

Quick Reply

Latest