For my Bio coursework I have done a study into population levels of different species of periwinkles in different areas on a beach. I did ten quadrats for each area that I studied and recorded the amount of periwinkles in each quadrat, which combined to give a total for each area.
My teacher has told me to go through all the results I got for each individual quadrat and determine which are statisitical anomalies based on whether they differ more than 2.5% either side of the mean result.
I found 76 of one species of periwinkle in one area from ten quadrats, so that means my average result is 7.6, 2.5% of 7.6 is 0.19. This makes ALL of my results anomalous as the result would have to be between 7.41 and 7.79, clearly impossible.
Not answering your question, but we had a good laugh at people deleting 'anomalous results' earlier today - what bad practice to teach
That is pretty bad teaching - you need to leave some anomalous results (or make some up) so you can discuss them and get the extra marks. You can't do enough discussion with perfect results.
You could try doing a standard deviation test and see what that value is as a percentage of your mean?? That might work??
Just a suggestion =)
Standard deviation as a percentage of mean? What in the world does that tell you?
Standard deviation is a good call though - just compare standard deviations across different results. If claiming a result is anomalous and then getting rid of that result from your calculations reduces the standard deviation significantly, then it's likely to be anomalous. High deviations = bad. Low deviations = good.
How many were in the other samples you took? If it was a lot less, then the 76 is an anomoly so your data would be skewed; in which case you should do a different statistical test. You should talk to your teacher about it though, I don't know what would gain you marks and what wouldn't.
That is pretty bad teaching - you need to leave some anomalous results (or make some up) so you can discuss them and get the extra marks. You can't do enough discussion with perfect results.
I meant more generally - take this example. The species may well live in colonies - if you eliminate this 70 species 'outlier' then you are not gaining a true idea of species distribution at all - you're assuming it is a constant distribution when in fact it is not!
I'd also apply it to scientific studies in general though - if you get such a 'different' result you repeat the measurement to ensure method is correct, and then include it anyway. You can't just eliminate it on statistical determinants alone - if it is part of your data it is part of your data! You need a justification (e.g. you are studying normal people and you get an anomalous result due to someone with pathology being tested) to remove them from any trends, otherwise you are just altering your results so you get a good correlation - bad practice.
I meant more generally - take this example. The species may well live in colonies - if you eliminate this 70 species 'outlier' then you are not gaining a true idea of species distribution at all - you're assuming it is a constant distribution when in fact it is not!
I'd also apply it to scientific studies in general though - if you get such a 'different' result you repeat the measurement to ensure method is correct, and then include it anyway. You can't just eliminate it on statistical determinants alone - if it is part of your data it is part of your data! You need a justification (e.g. you are studying normal people and you get an anomalous result due to someone with pathology being tested) to remove them from any trends, otherwise you are just altering your results so you get a good correlation - bad practice.
ah yes that it could be - i'd imagine you'd still be eliminating the 70 reading though which is still just silly.
Yes it is silly: when you do degree level and above, you dont remove data if it falls out of the 2.5% unless the data is normally distributed.
So for instance I do my third year research project, even if a result is outside the 2.5% boundries, if I can't give a reason for the unexpected outcome I have to include it but say that data isn't distributed normally. He wouldn't remove the result in a real experiment because he doesn't have a reason and he could just say the organisms were not distributed normally. He would have to to a stat test, which would confirm the data is not normally distributed.
At A-level you don't bother with distribution statistics because the stats involved isn't covered in A-level maths nevermind biology and the software for it to do it for you costs too much to buy.
OP include the point if you think it should be there and mention to your teacher you did, also in your conclusion state you left it in because it was clear the data was not normally distributed. You wont be able to do t-tests though because that is for normally distributed data, use the Mann Whitney test.
At A-level you don't bother with distribution statistics because the stats involved isn't covered in A-level maths nevermind biology and the software for it to do it for you costs too much to buy.
Agreed, but why teach these statistical methods at all? Seems pretty stupid to teach students to remove results that don't fit with the trend with no other justification.
For my Bio coursework I have done a study into population levels of different species of periwinkles in different areas on a beach. I did ten quadrats for each area that I studied and recorded the amount of periwinkles in each quadrat, which combined to give a total for each area.
My teacher has told me to go through all the results I got for each individual quadrat and determine which are statisitical anomalies based on whether they differ more than 2.5% either side of the mean result.
I found 76 of one species of periwinkle in one area from ten quadrats, so that means my average result is 7.6, 2.5% of 7.6 is 0.19. This makes ALL of my results anomalous as the result would have to be between 7.41 and 7.79, clearly impossible.
So, what should I be doing instead??
if you calculate sd and the the data is more than 2 standard deviation it is anomalous