Turn on thread page Beta

How do I find statistical significance with two sets of data (regression analysis)? watch

    • Thread Starter
    Offline

    3
    ReputationRep:
    Hey guys, I am having trouble understanding understanding how to find the coefficient(?) for statistical significance between two variables. I am trying to judge whether in elections the turnout rate positively correlates with the closesness of victory within it (just the example of the current data I have right now). I have been going through videos where they talk about things like "t tests" and "alpha levels"(?). what I understand is that you need to test two variables together and then you will get a coefficient. but I do not understand how to get that coefficient (i.e. in excel?). from that coefficient, how do I then work out statistical significance? in the data that I've read before they talk about "significant at the _% level" - how do I know whether something would be significant at a certain percentage? is this something I have to input before the calculations? I'm very sorry guys, I am no mathematician - I am in desperate need of help here
    • Thread Starter
    Offline

    3
    ReputationRep:
    kudasai?
    Offline

    19
    ReputationRep:
    (Original post by sleepysnooze)
    kudasai?
    shudn't it be 'onegai?'
    • Thread Starter
    Offline

    3
    ReputationRep:
    (Original post by rayestar)
    shudn't it be 'onegai?'
    please help me with my maths sempai

    Spoiler:
    Show

    I thought kudasai meant please give, so please give me your maths help
    Offline

    19
    ReputationRep:
    (Original post by sleepysnooze)
    please help me with my maths sempai

    Spoiler:
    Show


    I thought kudasai meant please give, so please give me your maths help

    ohh i like that pic

    when u ask for help for maths, its: 'suugaku wo tesudatte kudasai'
    but when u say 'please' alone it's 'onegai'

    sorry i can't help u there cos we haven't covered that in my skl lol
    i knw most of S1 cos i did level 3 statistical methods, but regression wasn't part of that spec so yeahh
    have u tried examsolutions?
    • Political Ambassador
    Offline

    20
    ReputationRep:
    Political Ambassador
    I'm not too sure, but try Wiki: https://en.m.wikipedia.org/wiki/Spea...on_coefficient
    Offline

    14
    ReputationRep:
    (Original post by sleepysnooze)
    Hey guys, I am having trouble understanding understanding how to find the coefficient(?) for statistical significance between two variables. I am trying to judge whether in elections the turnout rate positively correlates with the closesness of victory within it (just the example of the current data I have right now). I have been going through videos where they talk about things like "t tests" and "alpha levels"(?). what I understand is that you need to test two variables together and then you will get a coefficient. but I do not understand how to get that coefficient (i.e. in excel?). from that coefficient, how do I then work out statistical significance? in the data that I've read before they talk about "significant at the _% level" - how do I know whether something would be significant at a certain percentage? is this something I have to input before the calculations? I'm very sorry guys, I am no mathematician - I am in desperate need of help here
    So first off, what is the form of the data that you have? Do you have a number of observations in pairs where the first of the pair is percentage turnout and the second is margin of victory?

    If you've got something like this then the first thing to do is to plot the data on a graph - turnout on the x-axis and margin of victory on the y-axis. What you do next depends on the shape of the plot that you get - the tools that you apply next depend on certain assumptions, and you may have to transform your data in some way in order to make it obey these assumptions. Post a plot of the data here if you want some advice.

    But once this is done, you are probably going to do something using either (a) a correlation coefficient or (b) linear regression. With appropriate software you will automatically get some measure of statistical significance. What software do you have available, or are you doing this "by hand"?
    • Thread Starter
    Offline

    3
    ReputationRep:
    (Original post by Gregorius)
    So first off, what is the form of the data that you have? Do you have a number of observations in pairs where the first of the pair is percentage turnout and the second is margin of victory?
    yes

    If you've got something like this then the first thing to do is to plot the data on a graph - turnout on the x-axis and margin of victory on the y-axis. What you do next depends on the shape of the plot that you get - the tools that you apply next depend on certain assumptions, and you may have to transform your data in some way in order to make it obey these assumptions. Post a plot of the data here if you want some advice.
    I don't have data yet though, I'll have to get back to you on this

    But once this is done, you are probably going to do something using either (a) a correlation coefficient or (b) linear regression. With appropriate software you will automatically get some measure of statistical significance. What software do you have available, or are you doing this "by hand"?
    is there free software out there I can download? what do you recommend?
    Offline

    14
    ReputationRep:
    (Original post by sleepysnooze)
    is there free software out there I can download? what do you recommend?
    R is free. https://cran.r-project.org/

    It is very complex software, however, your task can be accomplished pretty quickly.
    • Thread Starter
    Offline

    3
    ReputationRep:
    (Original post by Gregorius)
    R is free. https://cran.r-project.org/

    It is very complex software, however, your task can be accomplished pretty quickly.
    how complex? too complex for a person who basically knows nothing about this level of maths?
    Offline

    14
    ReputationRep:
    (Original post by sleepysnooze)
    how complex? too complex for a person who basically knows nothing about this level of maths?
    It's the programme that professional statisticians use (among others) and it has a steep learning curve. That said, what you;re trying to do is fairly elementary and not difficult to do in R.

    But let me turn the question around: what software/calculator/computer do you have available to do this? Are you used to using spreadsheets for example?
    • Thread Starter
    Offline

    3
    ReputationRep:
    (Original post by Gregorius)
    It's the programme that professional statisticians use (among others) and it has a steep learning curve. That said, what you;re trying to do is fairly elementary and not difficult to do in R.

    But let me turn the question around: what software/calculator/computer do you have available to do this? Are you used to using spreadsheets for example?
    so if I have two sets of data that I am trying to find the statistical significance for (via one variable on the other) how do I find that value in the R program?
    Offline

    14
    ReputationRep:
    (Original post by sleepysnooze)
    so if I have two sets of data that I am trying to find the statistical significance for (via one variable on the other) how do I find that value in the R program?
    If you set up your data in a dataframe (which we'll call "dfr" with column names "turnout" and "margin" then plotting the data is as simple as

    with(dfr, plot(turnout, margin))

    If you get something nice and attractive for the plot then you would issue the incantation

    with(dfr, cor.text(turnout, margin))

    to calculate the correlation coefficient (and there are options for the different types of correlation coefficients)

    or you might fit a linear regression via

    mod <- lm(margin ~ turnout, data=dfr)
    summary(mod)
    • Thread Starter
    Offline

    3
    ReputationRep:
    (Original post by Gregorius)
    If you set up your data in a dataframe (which we'll call "dfr"]

    with column names "turnout" and "margin" then plotting the data is as simple as

    with(dfr, plot(turnout, margin))

    If you get something nice and attractive for the plot then you would issue the incantation

    with(dfr, cor.text(turnout, margin))

    to calculate the correlation coefficient (and there are options for the different types of correlation coefficients)

    or you might fit a linear regression via

    mod <- lm(margin ~ turnout, data=dfr)
    summary(mod)
    I'm really sorry that made very little sense to me - I've never used that program before so you'll have to explain it more simply
    if I have, in excel for instance, two rows of results that I am wanting to compare, how would I paste those results into R and then set up these tests? you said "dfr" but is that me literally pasting all my data without anything separating them from each other (i.e. turnout from margin)?
    or when you said "dtr, plot (margin, turnout), does "margin" simply mean all those results, and then turnout meaning the turnout results? so basically margin and turnout are actually (in your formula) meant to mean that I am just inputting numbers? could you please give me an example or something? I'm so sorry
    Offline

    14
    ReputationRep:
    (Original post by sleepysnooze)
    I'm really sorry that made very little sense to me - I've never used that program before so you'll have to explain it more simply
    if I have, in excel for instance, two rows of results that I am wanting to compare, how would I paste those results into R and then set up these tests? you said "dfr" but is that me literally pasting all my data without anything separating them from each other (i.e. turnout from margin)?
    or when you said "dtr, plot (margin, turnout), does "margin" simply mean all those results, and then turnout meaning the turnout results? so basically margin and turnout are actually (in your formula) meant to mean that I am just inputting numbers? could you please give me an example or something? I'm so sorry
    No worries - but if you have excel and are used to using it, you could do the analysis there rather than having to faff around with a new (and scary!) piece of software.

    So in Excel, arrange your data in two columns, and then make sure that you have the Data Analysis toolpak installed in your copy of excel. (Google how to do this if it is not).

    Then click on the "data analysis" button on the "Data" tab and choose "regression" as the analysis to do. You'll be prompted for the range of Y values (which should be the margin of victory numbers) and X values (the turnout numbers).

    In the results, the value of "R squared" is the square of the correlation coefficient and the thing labelled "Significance F" is your p-value.

    Do also plot the value you have using a scatter plot to make sure that there is a roughly linear relationship between the two variable.

    If you want to attach your data here, I'm quite happy to have a look at it and guide through the analysis.
    • Thread Starter
    Offline

    3
    ReputationRep:
    (Original post by Gregorius)
    No worries - but if you have excel and are used to using it, you could do the analysis there rather than having to faff around with a new (and scary!) piece of software.

    So in Excel, arrange your data in two columns, and then make sure that you have the Data Analysis toolpak installed in your copy of excel. (Google how to do this if it is not).
    okay, I've just got the toolpak

    Then click on the "data analysis" button on the "Data" tab and choose "regression" as the analysis to do. You'll be prompted for the range of Y values (which should be the margin of victory numbers) and X values (the turnout numbers).

    In the results, the value of "R squared" is the square of the correlation coefficient and the thing labelled "Significance F" is your p-value.
    what's a p-value? I've heard of that but there's never been an explanation for what it represents

    Do also plot the value you have using a scatter plot to make sure that there is a roughly linear relationship between the two variable.
    how do I set up the scatter graph? is it via regression or something totally different? I'm sorry - I'm probably asking a very stupid question right now - I'm just extremely lost and wanting to know this for certain

    If you want to attach your data here, I'm quite happy to have a look at it and guide through the analysis.
    okay I'm sorry if I've gone off from my original idea here but now I am wanting to test if a change of an institution will cause significant changes in the results of elections. I have 4 results sets: (1= the first country and 2 = the second) 1a) before the institutional change, and 1b) after the institutional change. and I have 2a) a similar system without institutional change, and 2b) after 1's institutional change and still with no institutional change (so I'm comparing how two similar systems can compare to each other when one gets a change of its electoral institution, basically). what should I do to prove that the institution was the determining factor of the change of results? I have found that there was a change for 1 but no change for 2 in terms of the average results over the time periods (the "effective number of political parties" increased by 1.5~, for instance) but how do I show that it wasn't a coincidence? I'm sorry, you sound extremely knowledgable about statistical analysis and this would help me beyond words if you could guide me
    • Thread Starter
    Offline

    3
    ReputationRep:
    (Original post by Gregorius)
    No worries - but if you have excel and are used to using it, you could do the analysis there rather than having to faff around with a new (and scary!) piece of software.

    So in Excel, arrange your data in two columns, and then make sure that you have the Data Analysis toolpak installed in your copy of excel. (Google how to do this if it is not).

    Then click on the "data analysis" button on the "Data" tab and choose "regression" as the analysis to do. You'll be prompted for the range of Y values (which should be the margin of victory numbers) and X values (the turnout numbers).

    In the results, the value of "R squared" is the square of the correlation coefficient and the thing labelled "Significance F" is your p-value.

    Do also plot the value you have using a scatter plot to make sure that there is a roughly linear relationship between the two variable.

    If you want to attach your data here, I'm quite happy to have a look at it and guide through the analysis.
    also:

    SUMMARY OUTPUT Regression Statistics Multiple R 0.022903566 R Square 0.000524573 Adjusted R Square -0.001017827 Standard Error 7.590635054 Observations 650 ANOVA df SS MS F Significance F Regression 1 19.59590527 19.59591 0.340101939 0.559973287 Residual 648 37336.29585 57.61774 Total 649 37355.89176 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 51.76244313 3.355788785 15.42482 5.93377E-46 45.17291013 58.35197612 45.17291 58.35198 X Variable 1 -2.954555179 5.066260901 -0.58318 0.559973287 -12.90282532 6.993714963 -12.9028 6.993715
    bGwDpQJrNBb8dJvz2c1qw9X9Kg+tgKp7 cdkPhvzGNfR2hWodjKxO4DBfCBWwIeOg yejpNf2EK89/sX
    (woops, I thought it would appear as a table)

    I just inserted my X data and Y data (X being majority of a candidate and Y being the turnout in an election) - if the confidence level is 95% (I don't even know what this implies, being a stats novice) and the significance F is 0.55997~ does that mena that the relationship between the two variables is significant? my guess is yes but this is quite important work I'm doing here so if you could explain it that would be absolutely fantastic
    Offline

    0
    ReputationRep:
    Would it help just to find the covariance between the two factors?
    Offline

    14
    ReputationRep:
    (Original post by Big Weiner)
    Would it help just to find the covariance between the two factors?
    That's exactly what we're doing in linear regression - another name for linear regression is "analysis of covariance".
    Offline

    14
    ReputationRep:
    (Original post by sleepysnooze)
    what's a p-value? I've heard of that but there's never been an explanation for what it represents
    The classical way of doing a statistical test - to see if something is a "statistically significant" result is to put forward a "null hypothesis" - usually a hypothesis that denies the connection that you're looking for - and then to see whether the data that you have is consistent with that null hypothesis.

    So in your case, you would frame the null hypothesis "there is no correlation between voter turnout and margin of victory".

    A p-value expresses how consistent your data is with the null hypothesis by calculating something (called a test statistic - in your case a correlation coefficient) from the observed data and working out the probability of getting as large a value of the test statistic as you have under the assumption that the null hypothesis is true.

    So for your case, say you observe a value of 0.56 for the correlation coefficient between voter turnout and margin of victory, and that you software says the p-value for this is 0.035. Then what this is saying is that the probability of obtaining a value of 0.56 or higher, purely by chance - as we are assuming there is no correlation - for the correlation is 0.035.

    A very small p-value (conventionally 0.05 or less) is taken to indicate evidence against the null hypothesis - that is, you can conclude that there is a correlation between turnout and margin of victory.

    how do I set up the scatter graph? is it via regression or something totally different? I'm sorry - I'm probably asking a very stupid question right now - I'm just extremely lost and wanting to know this for certain
    In Excel, choose the "Insert" tab and then choose "charts". I suggest you use a "scatter" chart - which on my installation is the first chart type at the top left of the selection.
 
 
 
Reply
Submit reply
Turn on thread page Beta
Updated: November 12, 2016
The home of Results and Clearing

1,481

people online now

1,567,000

students helped last year

University open days

  1. Keele University
    General Open Day Undergraduate
    Sun, 19 Aug '18
  2. University of Melbourne
    Open Day Undergraduate
    Sun, 19 Aug '18
  3. Sheffield Hallam University
    City Campus Undergraduate
    Tue, 21 Aug '18
Poll
A-level students - how do you feel about your results?
Useful resources

Make your revision easier

Maths

Maths Forum posting guidelines

Not sure where to post? Read the updated guidelines here

Equations

How to use LaTex

Writing equations the easy way

Student revising

Study habits of A* students

Top tips from students who have already aced their exams

Study Planner

Create your own Study Planner

Never miss a deadline again

Polling station sign

Thinking about a maths degree?

Chat with other maths applicants

Can you help? Study help unanswered threads

Groups associated with this forum:

View associated groups

The Student Room, Get Revising and Marked by Teachers are trading names of The Student Room Group Ltd.

Register Number: 04666380 (England and Wales), VAT No. 806 8067 22 Registered Office: International House, Queens Road, Brighton, BN1 3XE

Write a reply...
Reply
Hide
Reputation gems: You get these gems as you gain rep from other members for making good contributions and giving helpful advice.