Turn on thread page Beta

Help understanding boxplots and outliers (on SPSS) watch

    • Thread Starter
    Offline

    19
    ReputationRep:
    I made two boxplots on SPSS for length vs sex.
    For males, I have 32 samples, and the lengths range from 3cm to 20cm, but on the boxplot it's showing 2 outliers that are above 30cm (the units on the axis only go up to 20cm, and there's 2 outliers above 30cm with a circle next to one of them).

    Could someone explain how to find outliers on a boxplot and if this sounds right or if I've made a mistake somewhere? And when writing about the number of samples included in my boxplot, would I include the outliers and say all 32 points were used to make the boxplot or just 30?
    Thanks
    Offline

    18
    ReputationRep:
    (Original post by Petulia)
    I made two boxplots on SPSS for length vs sex.
    For males, I have 32 samples, and the lengths range from 3cm to 20cm, but on the boxplot it's showing 2 outliers that are above 30cm (the units on the axis only go up to 20cm, and there's 2 outliers above 30cm with a circle next to one of them).

    Could someone explain how to find outliers on a boxplot and if this sounds right or if I've made a mistake somewhere? And when writing about the number of samples included in my boxplot, would I include the outliers and say all 32 points were used to make the boxplot or just 30?
    Thanks
    SPSS uses the following definition to detect outliers on boxplots (http://www.unige.ch/ses/sococ/cl/sps.../outliers.html), however it uses a fairly conservative method to define outliers so that if you have a big enough sample it will detect outliers by chance. We really cannot tell you what to do as we don't know why your making a box plot, what its meant to show, etc.
    • Thread Starter
    Offline

    19
    ReputationRep:
    (Original post by iammichealjackson)
    SPSS uses the following definition to detect outliers on boxplots (http://www.unige.ch/ses/sococ/cl/sps.../outliers.html), however it uses a fairly conservative method to define outliers so that if you have a big enough sample it will detect outliers by chance. We really cannot tell you what to do as we don't know why your making a box plot, what its meant to show, etc.
    So are the outliers supposed to be values from within my data range or are they calculated differently? I'm confused because I don't understand where the outlier values are coming from really - are they numbers from your data samples or is there a specific way to calculate them? Sorry I'm just having trouble understanding some of the basic maths here.
    Offline

    18
    ReputationRep:
    (Original post by Petulia)
    So are the outliers supposed to be values from within my data range or are they calculated differently? I'm confused because I don't understand where the outlier values are coming from really - are they numbers from your data samples or is there a specific way to calculate them? Sorry I'm just having trouble understanding some of the basic maths here.
    So from Tabachnick & Fidell (2007, p. 73):

    There are four reasons for the presence of an outlier. First is incorrect data entry. Cases that are extreme should be checked carefully to see that data are correctly entered. Second is failure to specify missing-value codes in computer syntax so that missing-value indicators are read as real data. Third is that the outlier is not a member of the population from which you intended to sample. If the case should not have been sampled, it is deleted once it is detected. Fourth is that the case is from the intended population but the distribution for the variable in the population has more extreme values than a normal distribution. In this event, the researcher retains the case but considers changing the value on the variable(s) so that the case no longer has as much impact. Although errors in data entry and missing values specification are easily found and remedied, deciding between alternatives three and four, between deletion and retention with alteration, is difficult.
    So there isn't really a standard definition of an outlier, nor a standard way of detecting them. It's quite usual to use some criteria of scores with a z value of more than a certain amount to remove outliers- but this isn't a good rule to always apply! There's lots of information about it on google anyhow...
    • Thread Starter
    Offline

    19
    ReputationRep:
    (Original post by iammichealjackson)
    So from Tabachnick & Fidell (2007, p. 73):



    So there isn't really a standard definition of an outlier, nor a standard way of detecting them. It's quite usual to use some criteria of scores with a z value of more than a certain amount to remove outliers- but this isn't a good rule to always apply! There's lots of information about it on google anyhow...
    Thanks for this reply. I'm reading Discovering Stats on SPSS by Andy Field now, I think it's the 4th Edition but you were right it is really good at breaking this stuff down.
    • Thread Starter
    Offline

    19
    ReputationRep:
    Finally worked out where I was going wrong. The numbers it was showing on the boxplot weren't the actual outlier values, they were the case numbers, so I was supposed to go back to my data set and check what number was in box 32, and that value is the outlier.
    Discovering Stats using SPSS is proving to be very helpful - I had been Googling this issue for over a week with no luck until I started using this book. Would definitely recommend!
 
 
 
The home of Results and Clearing

2,696

people online now

1,567,000

students helped last year

University open days

  1. Sheffield Hallam University
    City Campus Undergraduate
    Tue, 21 Aug '18
  2. Bournemouth University
    Clearing Open Day Undergraduate
    Wed, 22 Aug '18
  3. University of Buckingham
    Postgraduate Open Evening Postgraduate
    Thu, 23 Aug '18
Poll
A-level students - how do you feel about your results?

The Student Room, Get Revising and Marked by Teachers are trading names of The Student Room Group Ltd.

Register Number: 04666380 (England and Wales), VAT No. 806 8067 22 Registered Office: International House, Queens Road, Brighton, BN1 3XE

Write a reply...
Reply
Hide
Reputation gems: You get these gems as you gain rep from other members for making good contributions and giving helpful advice.