(Original post by When you see it...)
What if you didn't mention the skew, but said that it was grouped data and each group had 30 or more cars in, therefore no obvious outliers, therefore mean?
Can someone confirm that this is wrong and explain why? I don't see why it is getting negs, it seems reasonable to me (although maybe you should use median just in case if you are uncertain about outliers...).
Thank you! YAAAAY! I got the histogram question right, that was very tricky. I think they may lower boundaries slightly due to the fact that that question was so hard and it was worth something like 14 marks.
I didn't sit the paper and I'm not a teacher but on Q5 e) I would have said the median is best because of the skew.
But couldn't it be argued that the 30 cars in the 10-15mph range (just 6.6% of the sample) are all outliers because no data was registered in the 15-20mph category? An outlier being a part situated away from a main or related body?
It then brings into question the reliabilty of this data. Would these slow cars have travelled faster if they hadn't been stuck behind a farm tractor or steamroller?
If we then ignore this slow car data as being unreliable, the positive skew then gets even bigger (mean 30.89mph versus median 28.75mph) giving even more justification to use the median.
Although normally it is the outliers that cause the skew and when you remove them the skewness is reduced.
Can you now say though that to say the average motorist breaks the speed limit if allowed (30.89mph) when clearly more than half do not? I don't think so, so for this question I don't think you can ever say the mean can be used because as there are no outliers.
I'm also not certain you can argue the median can be used because there are outliers. So not only was this a nasty question it was also a nasty set of data for the students to interpret. I hope the examiners go easy on the marking.
How many marks do you think i'd get given for the last two parts of the question involving the normal distribution function, if instead of using the small table including probabilities greater than, i took an average of two of the values from the larger table?
I know they usually would allow for some method marks or something