# How do I find statistical significance with two sets of data (regression analysis)?

Watch
Announcements

Hey guys, I am having trouble understanding understanding how to find the coefficient(?) for statistical significance between two variables. I am trying to judge whether in elections the turnout rate positively correlates with the closesness of victory within it (just the example of the current data I have right now). I have been going through videos where they talk about things like "t tests" and "alpha levels"(?). what I understand is that you need to test two variables together and then you will get a coefficient. but I do not understand how to get that coefficient (i.e. in excel?). from that coefficient, how do I then work out statistical significance? in the data that I've read before they talk about "significant at the _% level" - how do I know whether something would be significant at a certain percentage? is this something I have to input before the calculations? I'm very sorry guys, I am no mathematician - I am in desperate need of help here

0

reply

(Original post by

shudn't it be 'onegai?'

**rayestar**)shudn't it be 'onegai?'

Spoiler:

I thought kudasai meant please give, so please give me your maths help

Show

I thought kudasai meant please give, so please give me your maths help

0

reply

Report

#5

(Original post by

please help me with my maths sempai

**sleepysnooze**)please help me with my maths sempai

Spoiler:

I thought kudasai meant please give, so please give me your maths help

Show

I thought kudasai meant please give, so please give me your maths help

when u ask for help for maths, its: 'suugaku wo tesudatte kudasai'

but when u say 'please' alone it's 'onegai'

sorry i can't help u there cos we haven't covered that in my skl lol

i knw most of S1 cos i did level 3 statistical methods, but regression wasn't part of that spec so yeahh

have u tried examsolutions?

0

reply

Report

#6

I'm not too sure, but try Wiki: https://en.m.wikipedia.org/wiki/Spea...on_coefficient

0

reply

Report

#7

(Original post by

Hey guys, I am having trouble understanding understanding how to find the coefficient(?) for statistical significance between two variables. I am trying to judge whether in elections the turnout rate positively correlates with the closesness of victory within it (just the example of the current data I have right now). I have been going through videos where they talk about things like "t tests" and "alpha levels"(?). what I understand is that you need to test two variables together and then you will get a coefficient. but I do not understand how to get that coefficient (i.e. in excel?). from that coefficient, how do I then work out statistical significance? in the data that I've read before they talk about "significant at the _% level" - how do I know whether something would be significant at a certain percentage? is this something I have to input before the calculations? I'm very sorry guys, I am no mathematician - I am in desperate need of help here

**sleepysnooze**)Hey guys, I am having trouble understanding understanding how to find the coefficient(?) for statistical significance between two variables. I am trying to judge whether in elections the turnout rate positively correlates with the closesness of victory within it (just the example of the current data I have right now). I have been going through videos where they talk about things like "t tests" and "alpha levels"(?). what I understand is that you need to test two variables together and then you will get a coefficient. but I do not understand how to get that coefficient (i.e. in excel?). from that coefficient, how do I then work out statistical significance? in the data that I've read before they talk about "significant at the _% level" - how do I know whether something would be significant at a certain percentage? is this something I have to input before the calculations? I'm very sorry guys, I am no mathematician - I am in desperate need of help here

If you've got something like this then the first thing to do is to plot the data on a graph - turnout on the x-axis and margin of victory on the y-axis. What you do next depends on the shape of the plot that you get - the tools that you apply next depend on certain assumptions, and you may have to transform your data in some way in order to make it obey these assumptions. Post a plot of the data here if you want some advice.

But once this is done, you are probably going to do something using either (a) a correlation coefficient or (b) linear regression. With appropriate software you will automatically get some measure of statistical significance. What software do you have available, or are you doing this "by hand"?

0

reply

(Original post by

So first off, what is the form of the data that you have? Do you have a number of observations in pairs where the first of the pair is percentage turnout and the second is margin of victory?

**Gregorius**)So first off, what is the form of the data that you have? Do you have a number of observations in pairs where the first of the pair is percentage turnout and the second is margin of victory?

If you've got something like this then the first thing to do is to plot the data on a graph - turnout on the x-axis and margin of victory on the y-axis. What you do next depends on the shape of the plot that you get - the tools that you apply next depend on certain assumptions, and you may have to transform your data in some way in order to make it obey these assumptions. Post a plot of the data here if you want some advice.

But once this is done, you are probably going to do something using either (a) a correlation coefficient or (b) linear regression. With appropriate software you will automatically get some measure of statistical significance. What software do you have available, or are you doing this "by hand"?

0

reply

Report

#9

(Original post by

is there free software out there I can download? what do you recommend?

**sleepysnooze**)is there free software out there I can download? what do you recommend?

It is very complex software, however, your task can be accomplished pretty quickly.

0

reply

(Original post by

R is free. https://cran.r-project.org/

It is very complex software, however, your task can be accomplished pretty quickly.

**Gregorius**)R is free. https://cran.r-project.org/

It is very complex software, however, your task can be accomplished pretty quickly.

0

reply

Report

#11

(Original post by

how complex? too complex for a person who basically knows nothing about this level of maths?

**sleepysnooze**)how complex? too complex for a person who basically knows nothing about this level of maths?

But let me turn the question around: what software/calculator/computer do you have available to do this? Are you used to using spreadsheets for example?

0

reply

(Original post by

It's the programme that professional statisticians use (among others) and it has a steep learning curve. That said, what you;re trying to do is fairly elementary and not difficult to do in R.

But let me turn the question around: what software/calculator/computer do you have available to do this? Are you used to using spreadsheets for example?

**Gregorius**)It's the programme that professional statisticians use (among others) and it has a steep learning curve. That said, what you;re trying to do is fairly elementary and not difficult to do in R.

But let me turn the question around: what software/calculator/computer do you have available to do this? Are you used to using spreadsheets for example?

0

reply

Report

#13

(Original post by

so if I have two sets of data that I am trying to find the statistical significance for (via one variable on the other) how do I find that value in the R program?

**sleepysnooze**)so if I have two sets of data that I am trying to find the statistical significance for (via one variable on the other) how do I find that value in the R program?

with(dfr, plot(turnout, margin))

If you get something nice and attractive for the plot then you would issue the incantation

with(dfr, cor.text(turnout, margin))

to calculate the correlation coefficient (and there are options for the different types of correlation coefficients)

or you might fit a linear regression via

mod <- lm(margin ~ turnout, data=dfr)

summary(mod)

0

reply

(Original post by

If you set up your data in a dataframe (which we'll call "dfr"]

with column names "turnout" and "margin" then plotting the data is as simple as

with(dfr, plot(turnout, margin))

If you get something nice and attractive for the plot then you would issue the incantation

with(dfr, cor.text(turnout, margin))

to calculate the correlation coefficient (and there are options for the different types of correlation coefficients)

or you might fit a linear regression via

mod <- lm(margin ~ turnout, data=dfr)

summary(mod)

**Gregorius**)If you set up your data in a dataframe (which we'll call "dfr"]

with column names "turnout" and "margin" then plotting the data is as simple as

with(dfr, plot(turnout, margin))

If you get something nice and attractive for the plot then you would issue the incantation

with(dfr, cor.text(turnout, margin))

to calculate the correlation coefficient (and there are options for the different types of correlation coefficients)

or you might fit a linear regression via

mod <- lm(margin ~ turnout, data=dfr)

summary(mod)

if I have, in excel for instance, two rows of results that I am wanting to compare, how would I paste those results into R and then set up these tests? you said "dfr" but is that me literally pasting all my data without anything separating them from each other (i.e. turnout from margin)?

or when you said "dtr, plot (margin, turnout), does "margin" simply mean all those results, and then turnout meaning the turnout results? so basically margin and turnout are actually (in your formula) meant to mean that I am just inputting numbers? could you please give me an example or something? I'm so sorry

0

reply

Report

#15

(Original post by

I'm really sorry that made very little sense to me - I've never used that program before so you'll have to explain it more simply

if I have, in excel for instance, two rows of results that I am wanting to compare, how would I paste those results into R and then set up these tests? you said "dfr" but is that me literally pasting all my data without anything separating them from each other (i.e. turnout from margin)?

or when you said "dtr, plot (margin, turnout), does "margin" simply mean all those results, and then turnout meaning the turnout results? so basically margin and turnout are actually (in your formula) meant to mean that I am just inputting numbers? could you please give me an example or something? I'm so sorry

**sleepysnooze**)I'm really sorry that made very little sense to me - I've never used that program before so you'll have to explain it more simply

if I have, in excel for instance, two rows of results that I am wanting to compare, how would I paste those results into R and then set up these tests? you said "dfr" but is that me literally pasting all my data without anything separating them from each other (i.e. turnout from margin)?

or when you said "dtr, plot (margin, turnout), does "margin" simply mean all those results, and then turnout meaning the turnout results? so basically margin and turnout are actually (in your formula) meant to mean that I am just inputting numbers? could you please give me an example or something? I'm so sorry

So in Excel, arrange your data in two columns, and then make sure that you have the Data Analysis toolpak installed in your copy of excel. (Google how to do this if it is not).

Then click on the "data analysis" button on the "Data" tab and choose "regression" as the analysis to do. You'll be prompted for the range of Y values (which should be the margin of victory numbers) and X values (the turnout numbers).

In the results, the value of "R squared" is the square of the correlation coefficient and the thing labelled "Significance F" is your p-value.

Do also plot the value you have using a scatter plot to make sure that there is a roughly linear relationship between the two variable.

If you want to attach your data here, I'm quite happy to have a look at it and guide through the analysis.

0

reply

(Original post by

No worries - but if you have excel and are used to using it, you could do the analysis there rather than having to faff around with a new (and scary!) piece of software.

So in Excel, arrange your data in two columns, and then make sure that you have the Data Analysis toolpak installed in your copy of excel. (Google how to do this if it is not).

**Gregorius**)No worries - but if you have excel and are used to using it, you could do the analysis there rather than having to faff around with a new (and scary!) piece of software.

So in Excel, arrange your data in two columns, and then make sure that you have the Data Analysis toolpak installed in your copy of excel. (Google how to do this if it is not).

Then click on the "data analysis" button on the "Data" tab and choose "regression" as the analysis to do. You'll be prompted for the range of Y values (which should be the margin of victory numbers) and X values (the turnout numbers).

In the results, the value of "R squared" is the square of the correlation coefficient and the thing labelled "Significance F" is your p-value.

In the results, the value of "R squared" is the square of the correlation coefficient and the thing labelled "Significance F" is your p-value.

Do also plot the value you have using a scatter plot to make sure that there is a roughly linear relationship between the two variable.

If you want to attach your data here, I'm quite happy to have a look at it and guide through the analysis.

0

reply

(Original post by

No worries - but if you have excel and are used to using it, you could do the analysis there rather than having to faff around with a new (and scary!) piece of software.

So in Excel, arrange your data in two columns, and then make sure that you have the Data Analysis toolpak installed in your copy of excel. (Google how to do this if it is not).

Then click on the "data analysis" button on the "Data" tab and choose "regression" as the analysis to do. You'll be prompted for the range of Y values (which should be the margin of victory numbers) and X values (the turnout numbers).

In the results, the value of "R squared" is the square of the correlation coefficient and the thing labelled "Significance F" is your p-value.

Do also plot the value you have using a scatter plot to make sure that there is a roughly linear relationship between the two variable.

If you want to attach your data here, I'm quite happy to have a look at it and guide through the analysis.

**Gregorius**)No worries - but if you have excel and are used to using it, you could do the analysis there rather than having to faff around with a new (and scary!) piece of software.

So in Excel, arrange your data in two columns, and then make sure that you have the Data Analysis toolpak installed in your copy of excel. (Google how to do this if it is not).

Then click on the "data analysis" button on the "Data" tab and choose "regression" as the analysis to do. You'll be prompted for the range of Y values (which should be the margin of victory numbers) and X values (the turnout numbers).

In the results, the value of "R squared" is the square of the correlation coefficient and the thing labelled "Significance F" is your p-value.

Do also plot the value you have using a scatter plot to make sure that there is a roughly linear relationship between the two variable.

If you want to attach your data here, I'm quite happy to have a look at it and guide through the analysis.

SUMMARY OUTPUT Regression Statistics Multiple R 0.022903566 R Square 0.000524573 Adjusted R Square -0.001017827 Standard Error 7.590635054 Observations 650 ANOVA df SS MS F Significance F Regression 1 19.59590527 19.59591 0.340101939 0.559973287 Residual 648 37336.29585 57.61774 Total 649 37355.89176 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 51.76244313 3.355788785 15.42482 5.93377E-46 45.17291013 58.35197612 45.17291 58.35198 X Variable 1 -2.954555179 5.066260901 -0.58318 0.559973287 -12.90282532 6.993714963 -12.9028 6.993715

bGwDpQJrNBb8dJvz2c1qw9X9Kg+tgKp7cdkPhvzGNfR2hWodjKxO4DBfCBWwIeOgyejpNf2EK89/sX

(woops, I thought it would appear as a table)

I just inserted my X data and Y data (X being majority of a candidate and Y being the turnout in an election) - if the confidence level is 95% (I don't even know what this implies, being a stats novice) and the significance F is 0.55997~ does that mena that the relationship between the two variables is significant? my guess is yes but this is quite important work I'm doing here so if you could explain it that would be absolutely fantastic

0

reply

Report

#19

(Original post by

Would it help just to find the covariance between the two factors?

**Big Weiner**)Would it help just to find the covariance between the two factors?

0

reply

Report

#20

(Original post by

what's a p-value? I've heard of that but there's never been an explanation for what it represents

**sleepysnooze**)what's a p-value? I've heard of that but there's never been an explanation for what it represents

So in your case, you would frame the null hypothesis "there is no correlation between voter turnout and margin of victory".

A p-value expresses how consistent your data is with the null hypothesis by calculating something (called a test statistic - in your case a correlation coefficient) from the observed data and working out the probability of getting as large a value of the test statistic as you have under the assumption that the null hypothesis is true.

So for your case, say you observe a value of 0.56 for the correlation coefficient between voter turnout and margin of victory, and that you software says the p-value for this is 0.035. Then what this is saying is that the probability of obtaining a value of 0.56 or higher, purely by chance - as we are assuming there is no correlation - for the correlation is 0.035.

A very small p-value (conventionally 0.05 or less) is taken to indicate evidence against the null hypothesis - that is, you can conclude that there is a correlation between turnout and margin of victory.

how do I set up the scatter graph? is it via regression or something totally different? I'm sorry - I'm probably asking a very stupid question right now - I'm just extremely lost and wanting to know this for certain

0

reply

X

### Quick Reply

Back

to top

to top