coconut64
Badges: 16
Rep:
?
#1
Report Thread starter 3 years ago
#1
I have constructed a scatter plot for simple linear regression of 2 variables. I believe I have spotted 2 outliers so I removed them to see how much the correlation coefficient will be affected but it turns out that the change is minimal ( only change of 0.1) Is it mathematically correct to suggest that the effect of the outliers are insignificant?

Thanks
0
reply
ghostwalker
  • Study Helper
Badges: 17
#2
Report 3 years ago
#2
(Original post by coconut64)
I have constructed a scatter plot for simple linear regression of 2 variables. I believe I have spotted 2 outliers so I removed them to see how much the correlation coefficient will be affected but it turns out that the change is minimal ( only change of 0.1) Is it mathematically correct to suggest that the effect of the outliers are insignificant?

Thanks
Given that the correlation coefficient lies in the range -1 ot 1, then a change of 0.1 is hardly minimal. What were the actual values of the c.c.?
0
reply
coconut64
Badges: 16
Rep:
?
#3
Report Thread starter 3 years ago
#3
(Original post by ghostwalker)
Given that the correlation coefficient lies in the range -1 ot 1, then a change of 0.1 is hardly minimal. What were the actual values of the c.c.?

The correlation coefficient I worked out is 1.23 with the outliers; this shows a weak positive relationship. Excluding the outliers gives me a new cc value of 1.10. This still shows a positive weak relationship. Surely this change is not very significant to be considered? Thanks
0
reply
ghostwalker
  • Study Helper
Badges: 17
#4
Report 3 years ago
#4
(Original post by coconut64)
The correlation coefficient I worked out is 1.23 with the outliers; this shows a weak positive relationship. Excluding the outliers gives me a new cc value of 1.10. This still shows a positive weak relationship. Surely this change is not very significant to be considered? Thanks
How can you get a correlation coefficient greater than 1? Puzzled!
0
reply
coconut64
Badges: 16
Rep:
?
#5
Report Thread starter 3 years ago
#5
(Original post by ghostwalker)
How can you get a correlation coefficient greater than 1? Puzzled!
Oh yes, that's a terrible mistake. Sorry. I meant it decreases from 0.23 to 0.10. This is still showing a weak positive relationship.
0
reply
ghostwalker
  • Study Helper
Badges: 17
#6
Report 3 years ago
#6
(Original post by coconut64)
Oh yes, that's a terrible mistake. Sorry. I meant it decreases from 0.23 to 0.10. This is still showing a weak positive relationship.
Well the change from 0.23 to 0.1 is over 40%, so it's hardly insignificant.

And "weakly positive" had changed to "virtually non-existant".
0
reply
Gregorius
Badges: 14
Rep:
?
#7
Report 3 years ago
#7
(Original post by coconut64)
I have constructed a scatter plot for simple linear regression of 2 variables. I believe I have spotted 2 outliers so I removed them to see how much the correlation coefficient will be affected but it turns out that the change is minimal ( only change of 0.1) Is it mathematically correct to suggest that the effect of the outliers are insignificant?
I'd be a little bit careful about simply removing outliers, and I'd judge the effect of removing outliers by the amount of change in the regression coefficient rather than in the correlation coefficient.

When you do a linear regression (or calculate a correlation coefficient) you are making some assumptions about the data - in particular you are assuming that the residuals from the regression (the difference between the fitted and observed values) are normally distributed. So when you find outliers in a regression, there are a number of possibilities:

(i) The outliers are mistakes of some sort - like transcription errors, for example.
(ii) The probability model you're using is inappropriate - the residuals are not normally distributed.
(iii) The observations are genuine, and you've just by chance observed a couple of unusual extreme values.

So when you get outliers, pretty much the standard procedure is that you should go back to your data source and check whether the observations are faulty in some way. If they are not, you should not discard them, but rather set about thinking about how to cope with (ii) and (iii) above.

There are a couple of measures that are useful in checking for outliers, the Leverage and the Cook's Distance. The first measures how much influence individual observations might have on the regression fit, the second measures how much influence they do have. Most (decent) software for doing regression has some form of these built in.
1
reply
X

Quick Reply

Attached files
Write a reply...
Reply
new posts
Back
to top
Latest
My Feed

See more of what you like on
The Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

Personalise

Do you think receiving Teacher Assessed Grades will impact your future?

I'm worried it will negatively impact me getting into university/college (128)
41.69%
I'm worried that I’m not academically prepared for the next stage in my educational journey (35)
11.4%
I'm worried it will impact my future career (25)
8.14%
I'm worried that my grades will be seen as ‘lesser’ because I didn’t take exams (67)
21.82%
I don’t think that receiving these grades will impact my future (33)
10.75%
I think that receiving these grades will affect me in another way (let us know in the discussion!) (19)
6.19%

Watched Threads

View All
Latest
My Feed

See more of what you like on
The Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

Personalise