anon2332
Badges: 0
#1
Report Thread starter 11 years ago
#1
Evening everyone

I know outliners are often not considered when drawing a slope of a line, but if they were to be considered how would this affect the analysis? For example, if we were measuring speed in like a 30mph zone and a car was travelling over 50mph, would this cause the graph to increase, decrease or would it stay the same?

Would it increase purely because of the such big difference in speed?

Following on how would correlation affect this? Does it become stonger, weaker or stay the same?

I would say it increase as well only because the outliner is

I don't really know a full valid reason, so if anyone could help it would be appreciated.

Thanks
0
reply
screenager2004
Badges: 14
Rep:
?
#2
Report 11 years ago
#2
It depends how you are analysing the data: For example the mode is not sensitive to outliers at all, the mean is sensitive depending on the size of the sample (bigger sample = less sensitive), and the median and midrange are extremely sensitive to outliers.

Generally, if you include very high outliers it will make the average higher (and extremely low outliers will make the average lower), in your car example, including that high figure would raise the average overall.

Outliers also weaken correlation: how much it weakens the correlation depends on the number of items of data, if the sample is very small, then the correlation weakens a lot, if the sample is huge then one single outlier will affect it less.

Remember, weak correlation means there is a lot of deviation within a data set: and an outlier is, by definition, a big deviation!

You can give a full valid reason using Spearman's Correlational Coefficient, which will show how the inclusion of the outlier affects the correlation on a scale from -1 to 1.
0
reply
anon2332
Badges: 0
#3
Report Thread starter 11 years ago
#3
Thanks for the reply.

I understand what you are saying about the outliners but a little confused about the correltation. In an example i have looked at, it has a density of 68.6 cars. So but this meaning there are a lot of cars, does this weaken the correlation?

Thanks
0
reply
screenager2004
Badges: 14
Rep:
?
#4
Report 11 years ago
#4
The number of cars doesn't affect the correlation directly, 100 cars can be just as strongly correlated as 10 cars or 1,000,000 cars:

Correlation is about the relationship between all the cars and the average; how much deviation there is (how far is between each car from the average): are all the cars pretty close to the average, or do they all vary randomly and wildly? How greatly do they differ from the mean?

A weak correlation means that the cars are all going at completely different speeds, there's no real relationship between all the data.

A strong correlation means that all the cars are pretty much hovering around the mean, maybe they are all going at 29.5 and 30.2mph. There is a strong relationship between all the cars and the average, and there is very little deviation from the mean, .

So this outlier, this car speeding along at 90mph, it would weaken the correlation because it is a huge deviation: it's no where near the mean, so that nice, neat trend of cars all going near the average is weakened.

There are two ways you can see correlation really happening: either by plotting all the data on a graph and inspecting how the dots all hover around the same area together, or you can work the correlation out mathematically with the correlational coefficient.

(I am really explaining this terribly.)
0
reply
anon2332
Badges: 0
#5
Report Thread starter 11 years ago
#5
thanks, i think its becoming clearer.

So for this question:

Traffic planners investigated the relationship between traffic density (number of cars per mile) and the average speed of the traffic on a moderately larger city throughfare. The data were collected at the same location at 10 different times over a span of 3 months. They found a mean traffic denstiy of 68.6 carsper mile (cpm) with standard deviation 27.07 cpm. Overall, the cars' average speed was 26.38mph, with standard deviation 9.68 mph. These researchers found the regression line for there data to be:

Speed = 50.55-0.352Density

The data initally included the point density = 125cpm, speed 55mph. This point was considered an outliner and was not inlcuded in the analysis. Will the slop increase, decrease or remain the same if we redo the analysis and inclue the same point?
Provide full reasoning


The slope will increase because the outliner is some way above the average speed. When the outliner was not included, the average speed was 26.38mph, but as this car was travelling at 55mph (28.62mph above the average speed), it will increase. If the outliner car was travelling at 35mph, it would stay the same.

Will the correlation become stronger, weaker or stay the same if we redo the analysis and inclide the point (152,55)? Provide fill reasoning

The correlation will become weaker as this outliner is travelling at a completely different speed to the average speed, so there is no relationship to the data.
0
reply
screenager2004
Badges: 14
Rep:
?
#6
Report 11 years ago
#6
That sounds like you've got the jist of it (but I am not a formal maths student, so I will bump this thread in the hope that someone more qualified can give an opinion)

I am not sure what they mean by "full reasoning", do they want you to provide your calculations: For example, don't just say "it was above the average". Demonstrate how 125cpm is 2.08 standard deviations over the mean, (and generally we hold anything over 2 standard deviations from the mean to be an outlier) and that 55mph is 2.96 standard deviations from the mean. (which is quite a bit!) So yes, you're certainly right in saying including such an abnormally high value would drag up the average quite a bit!

And such huge standard deviations from the mean would definitely weaken the correlation.
0
reply
DFranklin
Badges: 18
Rep:
?
#7
Report 11 years ago
#7
I'm not particularly qualified in stats, but I think some of the reasoning here is fundamentally flawed.

Imagine the set of points (0,0), ..., (10, 10). Obviously they lie perfectly on the line y = x. The average value for y is 5.
You get a new point (0, 17). Obviously this is going to push up the average value of y (to 6, as it happens). But what happens to the slope of the best fit line? Draw a sketch, and you'll see the slope actually decreases (the line is going to have to move towards the new point, and that will push the slope down).

If I were the OP, I'd draw a sketch representing the regression line and data (obviously it can only be a sketch - you don't have the actual data). Then draw on the 'outlier' point, and see how it will affect the data.
0
reply
anon2332
Badges: 0
#8
Report Thread starter 11 years ago
#8
Oh right, I see what you mean now.

So the slope for the first question will actaully decrease because as the outliner is further away from the otehr points, it decreases to include that point?

Does my answer to the second question look alright?

Thanks
0
reply
DFranklin
Badges: 18
Rep:
?
#9
Report 11 years ago
#9
I don't understand how you've reached that conclusion from what I said.

Again, for the 2nd question, you need to consider a sketch.

[For example, imagine you have 11 data points, this time (0, 1) (1,0) (2,3) (3,2) (4,5) (5,4) (6,7) (7,6) (8,9) (9,8) (10,10). The data roughly speaking follows the line y = x, with a bit of deviation. Now you get an outlier (100,100). Note that it fits the "y=x" equation perfectly. I haven't done the actual calculation, but I'm pretty sure you'll find the correlation actually goes up, even though y=100 is a mile away from the average for the rest of the data.]
0
reply
mamaH
Badges: 0
Rep:
?
#10
Report 4 years ago
#10
HI, I was wondering how one would interpret correlation results when there is significant but weak correlation between variables.
0
reply
Gregorius
Badges: 14
Rep:
?
#11
Report 4 years ago
#11
(Original post by mamaH)
HI, I was wondering how one would interpret correlation results when there is significant but weak correlation between variables.
I'll illustrate with the example of continuous variables and the Pearson correlation coefficient. Similar will obtain for other types of variable and other measures of correlation.

If two variables are (approximately) linearly related then the correlation between them measures the degree to which the values of one of the variables can predict the values of the other. If the correlation between them is high then knowledge of the value of one of them will predict the value of the other with a good degree of precision. If the correlation is low, then the prediction will have low precision.

The correlation will be "statistically significant" if it is unlikely to have arisen simply by chance (and you will have probably set a value of 0.05, or similar, as a threshold for something happening by chance).

So, in summary a low, statistically significant, correlation suggests a real relationship between the variables, but one that has little predictive power.

BTW: would be a good idea in future to start a new thread rather than tacking on to an old one!
0
reply
TeeEm
Badges: 19
Rep:
?
#12
Report 4 years ago
#12
should the title read outliers?
0
reply
Gregorius
Badges: 14
Rep:
?
#13
Report 4 years ago
#13
(Original post by TeeEm)
should the title read outliers?
Yes; but "outliners" is one of those terms that really should exist as it's nice and pretty. Maybe it should be the set of extreme values in a sample as they "outline" the sample; or the sample points that define the convex hull for multi-dimensional data...
0
reply
X

Quick Reply

Attached files
Write a reply...
Reply
new posts
Back
to top
Latest
My Feed

See more of what you like on
The Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

Personalise

Have you experienced financial difficulties as a student due to Covid-19?

Yes, I have really struggled financially (21)
13.73%
I have experienced some financial difficulties (42)
27.45%
I haven't experienced any financial difficulties and things have stayed the same (63)
41.18%
I have had better financial opportunities as a result of the pandemic (23)
15.03%
I've had another experience (let us know in the thread!) (4)
2.61%

Watched Threads

View All
Latest
My Feed