# Calculating averages with or without anomalies?

In class, we did an experiment about osmosis.

For a set of data; using a concentration of 0.50M we calculated the following percentage changes:

-7.55%
-42.26%
+41.61%

When calculating an average for the other sets of data, we removed the anomalies. However, in this case we have two anomalies (-7.55% would follow the general trend)

What should I do!?

Should I calculate the average using all three results, just the two negative results - or use -7.55% on it's own?

Thanks
Hi there,

While you're waiting for an answer, did you know we have 300,000 study resources that could answer your question in TSR's Learn together section?

We have everything from Teacher Marked Essays to Mindmaps and Quizzes to help you with your work. Take a look around.

If you're stuck on how to get started, try creating some resources. It's free to do and can help breakdown tough topics into manageable chunks. Get creating now.

Thanks!

Not sure what all of this is about? Head here to find out more.
Original post by ganners28
In class, we did an experiment about osmosis.

For a set of data; using a concentration of 0.50M we calculated the following percentage changes:

-7.55%
-42.26%
+41.61%

When calculating an average for the other sets of data, we removed the anomalies. However, in this case we have two anomalies (-7.55% would follow the general trend)

What should I do!?

Should I calculate the average using all three results, just the two negative results - or use -7.55% on it's own?

Thanks

I've got no idea what you're talking about. Can you explain the experiment, where your 3 figures come from, what the "other sets of data" are?

You can't just throw away data because they're unusual. If the result suggests that something went wrong with the experiment that might be justified.
Original post by chazwomaq
I've got no idea what you're talking about. Can you explain the experiment, where your 3 figures come from, what the "other sets of data" are?

You can't just throw away data because they're unusual. If the result suggests that something went wrong with the experiment that might be justified.

We are looking at how a more concentrated solution affects the rate of osmosis after fifteen minutes. We used different concentrations:

0.10 M
0.25 M
0.50 M
0.75 M
1.00 M

For each of the concentrations we have calculated an average. To calculate the best estimate of the true value, you must remove outliers to ensure the data isn't affected too greatly. Later, we explain in detail where these errors were, why they might have occurred and improvements to be made, to ensure these problems don't happen again.

The question I am asking is what to do with 0.50 M and the three results which don't correlate with the data, besides the percentage change of -7.55% - as we increased the concentration (Lowering the water potential) the the mass of the potato decreases so the overall percentage change will be negative, and will continue to decrease.

Any ideas?
Original post by ganners28
To calculate the best estimate of the true value, you must remove outliers to ensure the data isn't affected too greatly

No.

You CANNOT remove data just because it is not what you expect. That would be lying.

If you think there was a mistake in these data - e.g. the value cannot be obtained from the experiment, you spilled your coffee on the cells or whatever, then you can justify removing the data point.

The reason you sometimes hear talk about "removing outliers" is in case a given effect such as a correlation is driven solely by the outlier. If you remove it and re-run the analysis, find the effect disappears, then you should be cautious about your correlation - it may not be reliable.

But I repeat, you cannot remove any data just because you don't like the look of it.
Original post by chazwomaq
No.

You CANNOT remove data just because it is not what you expect. That would be lying.

If you think there was a mistake in these data - e.g. the value cannot be obtained from the experiment, you spilled your coffee on the cells or whatever, then you can justify removing the data point.

The reason you sometimes hear talk about "removing outliers" is in case a given effect such as a correlation is driven solely by the outlier. If you remove it and re-run the analysis, find the effect disappears, then you should be cautious about your correlation - it may not be reliable.

But I repeat, you cannot remove any data just because you don't like the look of it.

Okay, that contradicts what I have been told by some teachers but I would agree with you - and in this case, it seems the most logical.

Thanks
Original post by ganners28
Okay, that contradicts what I have been told by some teachers but I would agree with you - and in this case, it seems the most logical.

Thanks

Yeah it really worries me when my students tell me they removed some data because it was an outlier. I query them and they say "that's what I was taught to do at A level"!

You can certainly speculate why you got varied results. It may suggest a problem with the experiment at that concentration.
Plot the points with error bars- that way you can see that this point is unreliable and so it won't affect the overall trend too much.
Original post by lerjj
Plot the points with error bars- that way you can see that this point is unreliable and so it won't affect the overall trend too much.

I will! Thanks
Bit late here- but I would assume that there must be a fault with the experiment - and do it again. Try with more data samples too - and make sure all of the control variables are reliable