Hey there! Sign in to join this conversationNew here? Join for free
x Turn on thread page Beta
    • Thread Starter
    Offline

    1
    ReputationRep:
    I'm currently working on a research project for uni that involves a small amount of statistical analysis. To avoid being too technical, I've renamed the variable A and B.

    I need to investigate the effect of variable A on variable B.
    Variable A is categorical and ordinal, with 6 categories.
    Variable B is numerical and discrete, with 3 intervals (0, 1 and 2).

    I have 5 years worth of data on variable B, so I'm going to compare variable A to the average of variable B for each year. However, do I use the median or the mean?

    Variable B is heavily skewed for each year, with 85% of the data being 0. This means that the median average is ALWAYS 0. Is the mean appropriate to use in this situation, even though the data isn't normally distributed?

    Thanks in advance!
    Offline

    13
    ReputationRep:
    (Original post by Willisme1)
    I'm currently working on a research project for uni that involves a small amount of statistical analysis. To avoid being too technical, I've renamed the variable A and B.

    I need to investigate the effect of variable A on variable B.
    Variable A is categorical and ordinal, with 6 categories.
    Variable B is numerical and discrete, with 3 intervals (0, 1 and 2).
    Do you mean that B is numerical with 3 possible values (0, 1 and 2)?

    I have 5 years worth of data on variable B,
    How many individual observations to you have per year?


    so I'm going to compare variable A to the average of variable B for each year. However, do I use the median or the mean?
    It's not clear to me why you would use a summary statistic at this point. Perhaps this would depend upon the research question (which it would be helpful for you to state), but something like ordinal logistic regression is suuggesting itself to me at this point, treating the outcome B as ordinal.

    Variable B is heavily skewed for each year, with 85% of the data being 0. This means that the median average is ALWAYS 0. Is the mean appropriate to use in this situation, even though the data isn't normally distributed?
    Depends strongly on the type of analysis that you go for. If you chose ordinal logistic regression, the problem goes away. If 85% of the oucome is zero, it would be very helpful to understand the meaning of the difference between the values 1 and 2 of B - would an analysis that compares zero with non-zero be more helpful, for instance?
    • Thread Starter
    Offline

    1
    ReputationRep:
    Thanks for the reply, I did some thinking about how I'm tackling this and I'm very confused by what I should be doing.

    I'm investigating the effect of parity (litter number) on the number of piglets born alive per litter in breeding sows. One of the things I've suggested is that parity might correlate with the number of stillbirths per litter.

    Parity is categorical but as there are 8 categories, I have chosen to treat it as continuous (as recommended by my lecturers).
    Stillbirths per litter is numerical and discrete, with values of either 0, 1 or 2.

    However, the distribution of stillbirths per litter is strongly skewed to one side as previously mentioned. If I just treat the 5 years as one block of data, and run a Kruskal-Wallis test, I get a significant result (P < 0.0001) but I'm not sure whether I've done that right.
    Offline

    13
    ReputationRep:
    (Original post by Willisme1)
    Thanks for the reply, I did some thinking about how I'm tackling this and I'm very confused by what I should be doing.

    I'm investigating the effect of parity (litter number) on the number of piglets born alive per litter in breeding sows. One of the things I've suggested is that parity might correlate with the number of stillbirths per litter.

    Parity is categorical but as there are 8 categories, I have chosen to treat it as continuous (as recommended by my lecturers).
    Stillbirths per litter is numerical and discrete, with values of either 0, 1 or 2.

    However, the distribution of stillbirths per litter is strongly skewed to one side as previously mentioned. If I just treat the 5 years as one block of data, and run a Kruskal-Wallis test, I get a significant result (P < 0.0001) but I'm not sure whether I've done that right.

    There are a number of approaches to data like this - and the choice of approach depends on how much data you have and whether you feel confident in using the methods!

    1) The Kruskal-Wallis test is a good start. It's a non-parametric test (which doesn't care about the distribution of the variables) that tells you whether the distribution of the outcome (number of stillbirths in this case) varies by the value of the independent variable. It's nice and simple to use but suffers from the weakness that it's a portmanteau test - it doesn't tell you how the number of stillbirths varies with parity, just that it does. If you have enough data, you could simply apply this by year.

    2) If you want to see how the number of still births varies by parity then you might consider Poisson regression. Either take the parity as a numerical variable (which will assume the effect is linear) or as categorical, if you have enough data, to see exactly what is happening in each category. Again, if you have enough data, you could add year number as an independent variable. The downside of using this approach is that it is parametric and you should check the diagnostics of the regression to make sure that assumptions are being met.

    3) If you assume that both number of stillbirths and parity are ordinal variables (seems reasonable to me) then you could fit some sort of ordinal logistic regression. This makes fewer distributional assumptions than Poisson regression, but implies a particular model for how stillbirths increase by parity.
 
 
 
Reply
Submit reply
Turn on thread page Beta
Updated: April 23, 2016
Poll
Are you going to a festival?
Useful resources

Make your revision easier

Maths

Maths Forum posting guidelines

Not sure where to post? Read the updated guidelines here

Equations

How to use LaTex

Writing equations the easy way

Student revising

Study habits of A* students

Top tips from students who have already aced their exams

Study Planner

Create your own Study Planner

Never miss a deadline again

Polling station sign

Thinking about a maths degree?

Chat with other maths applicants

Can you help? Study help unanswered threads

Groups associated with this forum:

View associated groups

The Student Room, Get Revising and Marked by Teachers are trading names of The Student Room Group Ltd.

Register Number: 04666380 (England and Wales), VAT No. 806 8067 22 Registered Office: International House, Queens Road, Brighton, BN1 3XE

Write a reply...
Reply
Hide
Reputation gems: You get these gems as you gain rep from other members for making good contributions and giving helpful advice.