x Turn on thread page Beta
 You are Here: Home >< Maths

# Data distribution representation watch

1. Hi I'm not really sure how to word this so sorry if it's quite hard to understand. I don't understand when you use the standard deviation or mean or median to represent the distribution of data? Thank you in advance
2. (Original post by Meggy moo 1)
Hi I'm not really sure how to word this so sorry if it's quite hard to understand. I don't understand when you use the standard deviation or mean or median to represent the distribution of data? Thank you in advance
This is difficult question both to ask and to answer precisely!

The general idea of these so-called "summary statistics" is to give a simple characterization of a data distribution using as few numbers as possible. So both the mean and the median give some idea of the location of the "centre" of the data and both the standard deviation and inter-quartile range give some idea of how spread out the data is around the centre.

So why would we favour one simplified representation (median + inter-quartile range) over another (mean + standard deviation)?

The answer comes down to what the data looks like in the first place. Let's go through a few possibilities.

If your data looks like it has been drawn from a normal distribution, then we know that the mean and the standard deviation together exactly specify that distribution and so the sample mean and standard deviation would provide a good summary of that data sample.

What about if your data looks as if it has come from a Poisson distribution? We know then that the mean characterizes the distribution precisely (with the standard deviation equal to the square root of the mean). You only really then need one number to characterize your data.

On the other hand, what if you have no clue where your data comes from (in terms of probability distributions) and it looks horrible? It might not be symmetric about the mean (i.e. it is skew); it might have extreme outliers; it might have very fat tails (i.e. a lot of the data in the extremes of the distribution). It's then that you have to think carefully about how to provide a compact data summary!

If outliers are not a problem, then you might consider using the mean together with the standard deviation and the skewness and the kurtosis to characterize the distribution - but notice that your summary is becoming much more complex!

If outliers are a problem, then notice that they can have a big effect on the mean and the standard deviation; you have to ask yourself whether the mean and the standard deviation are giving a good representation of the majority of the data. If not, then notice that the median and quartiles are not hugely affected by the values of outliers; they may give a much better data summary in this case. Equally, you may wish to use another quantile (such as a decile) to communicate what is going on in the tails of the data whilst being protected from the effect of extreme outliers.

Best of all, of course, is a graphical summary of the data such as a box and whisker plot, or a rug, or a histogram.
3. (Original post by Gregorius)
This is difficult question both to ask and to answer precisely!

The general idea of these so-called "summary statistics" is to give a simple characterization of a data distribution using as few numbers as possible. So both the mean and the median give some idea of the location of the "centre" of the data and both the standard deviation and inter-quartile range give some idea of how spread out the data is around the centre.

So why would we favour one simplified representation (median + inter-quartile range) over another (mean + standard deviation)?

The answer comes down to what the data looks like in the first place. Let's go through a few possibilities.

If your data looks like it has been drawn from a normal distribution, then we know that the mean and the standard deviation together exactly specify that distribution and so the sample mean and standard deviation would provide a good summary of that data sample.

What about if your data looks as if it has come from a Poisson distribution? We know then that the mean characterizes the distribution precisely (with the standard deviation equal to the square root of the mean). You only really then need one number to characterize your data.

On the other hand, what if you have no clue where your data comes from (in terms of probability distributions) and it looks horrible? It might not be symmetric about the mean (i.e. it is skew); it might have extreme outliers; it might have very fat tails (i.e. a lot of the data in the extremes of the distribution). It's then that you have to think carefully about how to provide a compact data summary!

If outliers are not a problem, then you might consider using the mean together with the standard deviation and the skewness and the kurtosis to characterize the distribution - but notice that your summary is becoming much more complex!

If outliers are a problem, then notice that they can have a big effect on the mean and the standard deviation; you have to ask yourself whether the mean and the standard deviation are giving a good representation of the majority of the data. If not, then notice that the median and quartiles are not hugely affected by the values of outliers; they may give a much better data summary in this case. Equally, you may wish to use another quantile (such as a decile) to communicate what is going on in the tails of the data whilst being protected from the effect of extreme outliers.

Best of all, of course, is a graphical summary of the data such as a box and whisker plot, or a rug, or a histogram.
Oh wow thank you, I kind of understand

Turn on thread page Beta
TSR Support Team

We have a brilliant team of more than 60 Support Team members looking after discussions on The Student Room, helping to make it a fun, safe and useful place to hang out.

This forum is supported by:
Updated: October 26, 2015
Today on TSR

### How much will your degree earn you?

Find out where yours ranks...

Poll
Useful resources

## Make your revision easier

### Maths Forum posting guidelines

Not sure where to post? Read the updated guidelines here

### How to use LaTex

Writing equations the easy way

### Study habits of A* students

Top tips from students who have already aced their exams

Can you help? Study help unanswered threads

## Groups associated with this forum:

View associated groups

The Student Room, Get Revising and Marked by Teachers are trading names of The Student Room Group Ltd.

Register Number: 04666380 (England and Wales), VAT No. 806 8067 22 Registered Office: International House, Queens Road, Brighton, BN1 3XE