Hey there! Sign in to join this conversationNew here? Join for free
x Turn on thread page Beta
    • Thread Starter
    Offline

    2
    ReputationRep:
    Hi I'm not really sure how to word this so sorry if it's quite hard to understand. I don't understand when you use the standard deviation or mean or median to represent the distribution of data? Thank you in advance
    Offline

    13
    ReputationRep:
    (Original post by Meggy moo 1)
    Hi I'm not really sure how to word this so sorry if it's quite hard to understand. I don't understand when you use the standard deviation or mean or median to represent the distribution of data? Thank you in advance
    This is difficult question both to ask and to answer precisely!

    The general idea of these so-called "summary statistics" is to give a simple characterization of a data distribution using as few numbers as possible. So both the mean and the median give some idea of the location of the "centre" of the data and both the standard deviation and inter-quartile range give some idea of how spread out the data is around the centre.

    So why would we favour one simplified representation (median + inter-quartile range) over another (mean + standard deviation)?

    The answer comes down to what the data looks like in the first place. Let's go through a few possibilities.

    If your data looks like it has been drawn from a normal distribution, then we know that the mean and the standard deviation together exactly specify that distribution and so the sample mean and standard deviation would provide a good summary of that data sample.

    What about if your data looks as if it has come from a Poisson distribution? We know then that the mean characterizes the distribution precisely (with the standard deviation equal to the square root of the mean). You only really then need one number to characterize your data.

    On the other hand, what if you have no clue where your data comes from (in terms of probability distributions) and it looks horrible? It might not be symmetric about the mean (i.e. it is skew); it might have extreme outliers; it might have very fat tails (i.e. a lot of the data in the extremes of the distribution). It's then that you have to think carefully about how to provide a compact data summary!

    If outliers are not a problem, then you might consider using the mean together with the standard deviation and the skewness and the kurtosis to characterize the distribution - but notice that your summary is becoming much more complex!

    If outliers are a problem, then notice that they can have a big effect on the mean and the standard deviation; you have to ask yourself whether the mean and the standard deviation are giving a good representation of the majority of the data. If not, then notice that the median and quartiles are not hugely affected by the values of outliers; they may give a much better data summary in this case. Equally, you may wish to use another quantile (such as a decile) to communicate what is going on in the tails of the data whilst being protected from the effect of extreme outliers.

    Best of all, of course, is a graphical summary of the data such as a box and whisker plot, or a rug, or a histogram.
    • Thread Starter
    Offline

    2
    ReputationRep:
    (Original post by Gregorius)
    This is difficult question both to ask and to answer precisely!

    The general idea of these so-called "summary statistics" is to give a simple characterization of a data distribution using as few numbers as possible. So both the mean and the median give some idea of the location of the "centre" of the data and both the standard deviation and inter-quartile range give some idea of how spread out the data is around the centre.

    So why would we favour one simplified representation (median + inter-quartile range) over another (mean + standard deviation)?

    The answer comes down to what the data looks like in the first place. Let's go through a few possibilities.

    If your data looks like it has been drawn from a normal distribution, then we know that the mean and the standard deviation together exactly specify that distribution and so the sample mean and standard deviation would provide a good summary of that data sample.

    What about if your data looks as if it has come from a Poisson distribution? We know then that the mean characterizes the distribution precisely (with the standard deviation equal to the square root of the mean). You only really then need one number to characterize your data.

    On the other hand, what if you have no clue where your data comes from (in terms of probability distributions) and it looks horrible? It might not be symmetric about the mean (i.e. it is skew); it might have extreme outliers; it might have very fat tails (i.e. a lot of the data in the extremes of the distribution). It's then that you have to think carefully about how to provide a compact data summary!

    If outliers are not a problem, then you might consider using the mean together with the standard deviation and the skewness and the kurtosis to characterize the distribution - but notice that your summary is becoming much more complex!

    If outliers are a problem, then notice that they can have a big effect on the mean and the standard deviation; you have to ask yourself whether the mean and the standard deviation are giving a good representation of the majority of the data. If not, then notice that the median and quartiles are not hugely affected by the values of outliers; they may give a much better data summary in this case. Equally, you may wish to use another quantile (such as a decile) to communicate what is going on in the tails of the data whilst being protected from the effect of extreme outliers.

    Best of all, of course, is a graphical summary of the data such as a box and whisker plot, or a rug, or a histogram.
    Oh wow thank you, I kind of understand
 
 
 
Reply
Submit reply
Turn on thread page Beta
Updated: October 26, 2015
Poll
Do you agree with the proposed ban on plastic straws and cotton buds?
Useful resources

Make your revision easier

Maths

Maths Forum posting guidelines

Not sure where to post? Read the updated guidelines here

Equations

How to use LaTex

Writing equations the easy way

Student revising

Study habits of A* students

Top tips from students who have already aced their exams

Study Planner

Create your own Study Planner

Never miss a deadline again

Polling station sign

Thinking about a maths degree?

Chat with other maths applicants

Can you help? Study help unanswered threads

Groups associated with this forum:

View associated groups

The Student Room, Get Revising and Marked by Teachers are trading names of The Student Room Group Ltd.

Register Number: 04666380 (England and Wales), VAT No. 806 8067 22 Registered Office: International House, Queens Road, Brighton, BN1 3XE

Write a reply...
Reply
Hide
Reputation gems: You get these gems as you gain rep from other members for making good contributions and giving helpful advice.