# Is the mean appropriate in this situation?

Watch
Announcements

Page 1 of 1

Go to first unread

Skip to page:

I'm currently working on a research project for uni that involves a small amount of statistical analysis. To avoid being too technical, I've renamed the variable A and B.

I need to investigate the effect of variable A on variable B.

Variable A is categorical and ordinal, with 6 categories.

Variable B is numerical and discrete, with 3 intervals (0, 1 and 2).

I have 5 years worth of data on variable B, so I'm going to compare variable A to the average of variable B for each year. However, do I use the median or the mean?

Variable B is heavily skewed for each year, with 85% of the data being 0. This means that the median average is ALWAYS 0. Is the mean appropriate to use in this situation, even though the data isn't normally distributed?

Thanks in advance!

I need to investigate the effect of variable A on variable B.

Variable A is categorical and ordinal, with 6 categories.

Variable B is numerical and discrete, with 3 intervals (0, 1 and 2).

I have 5 years worth of data on variable B, so I'm going to compare variable A to the average of variable B for each year. However, do I use the median or the mean?

Variable B is heavily skewed for each year, with 85% of the data being 0. This means that the median average is ALWAYS 0. Is the mean appropriate to use in this situation, even though the data isn't normally distributed?

Thanks in advance!

0

reply

Report

#2

(Original post by

I'm currently working on a research project for uni that involves a small amount of statistical analysis. To avoid being too technical, I've renamed the variable A and B.

I need to investigate the effect of variable A on variable B.

Variable A is categorical and ordinal, with 6 categories.

Variable B is numerical and discrete, with 3 intervals (0, 1 and 2).

**Willisme1**)I'm currently working on a research project for uni that involves a small amount of statistical analysis. To avoid being too technical, I've renamed the variable A and B.

I need to investigate the effect of variable A on variable B.

Variable A is categorical and ordinal, with 6 categories.

Variable B is numerical and discrete, with 3 intervals (0, 1 and 2).

**values**(0, 1 and 2)?

I have 5 years worth of data on variable B,

so I'm going to compare variable A to the average of variable B for each year. However, do I use the median or the mean?

Variable B is heavily skewed for each year, with 85% of the data being 0. This means that the median average is ALWAYS 0. Is the mean appropriate to use in this situation, even though the data isn't normally distributed?

**meaning**of the difference between the values 1 and 2 of B - would an analysis that compares zero with non-zero be more helpful, for instance?

1

reply

Thanks for the reply, I did some thinking about how I'm tackling this and I'm very confused by what I should be doing.

I'm investigating the effect of parity (litter number) on the number of piglets born alive per litter in breeding sows. One of the things I've suggested is that parity might correlate with the number of stillbirths per litter.

Parity is categorical but as there are 8 categories, I have chosen to treat it as continuous (as recommended by my lecturers).

Stillbirths per litter is numerical and discrete, with values of either 0, 1 or 2.

However, the distribution of stillbirths per litter is strongly skewed to one side as previously mentioned. If I just treat the 5 years as one block of data, and run a Kruskal-Wallis test, I get a significant result (P < 0.0001) but I'm not sure whether I've done that right.

I'm investigating the effect of parity (litter number) on the number of piglets born alive per litter in breeding sows. One of the things I've suggested is that parity might correlate with the number of stillbirths per litter.

Parity is categorical but as there are 8 categories, I have chosen to treat it as continuous (as recommended by my lecturers).

Stillbirths per litter is numerical and discrete, with values of either 0, 1 or 2.

However, the distribution of stillbirths per litter is strongly skewed to one side as previously mentioned. If I just treat the 5 years as one block of data, and run a Kruskal-Wallis test, I get a significant result (P < 0.0001) but I'm not sure whether I've done that right.

0

reply

Report

#4

(Original post by

Thanks for the reply, I did some thinking about how I'm tackling this and I'm very confused by what I should be doing.

I'm investigating the effect of parity (litter number) on the number of piglets born alive per litter in breeding sows. One of the things I've suggested is that parity might correlate with the number of stillbirths per litter.

Parity is categorical but as there are 8 categories, I have chosen to treat it as continuous (as recommended by my lecturers).

Stillbirths per litter is numerical and discrete, with values of either 0, 1 or 2.

However, the distribution of stillbirths per litter is strongly skewed to one side as previously mentioned. If I just treat the 5 years as one block of data, and run a Kruskal-Wallis test, I get a significant result (P < 0.0001) but I'm not sure whether I've done that right.

**Willisme1**)Thanks for the reply, I did some thinking about how I'm tackling this and I'm very confused by what I should be doing.

I'm investigating the effect of parity (litter number) on the number of piglets born alive per litter in breeding sows. One of the things I've suggested is that parity might correlate with the number of stillbirths per litter.

Parity is categorical but as there are 8 categories, I have chosen to treat it as continuous (as recommended by my lecturers).

Stillbirths per litter is numerical and discrete, with values of either 0, 1 or 2.

However, the distribution of stillbirths per litter is strongly skewed to one side as previously mentioned. If I just treat the 5 years as one block of data, and run a Kruskal-Wallis test, I get a significant result (P < 0.0001) but I'm not sure whether I've done that right.

There are a number of approaches to data like this - and the choice of approach depends on how much data you have and whether you feel confident in using the methods!

1) The Kruskal-Wallis test is a good start. It's a non-parametric test (which doesn't care about the distribution of the variables) that tells you whether the distribution of the outcome (number of stillbirths in this case) varies by the value of the independent variable. It's nice and simple to use but suffers from the weakness that it's a portmanteau test - it doesn't tell you

**how**the number of stillbirths varies with parity, just that it does. If you have enough data, you could simply apply this by year.

2) If you want to see how the number of still births varies by parity then you might consider Poisson regression. Either take the parity as a numerical variable (which will assume the effect is linear) or as categorical, if you have enough data, to see exactly what is happening in each category. Again, if you have enough data, you could add year number as an independent variable. The downside of using this approach is that it is parametric and you should check the diagnostics of the regression to make sure that assumptions are being met.

3) If you assume that both number of stillbirths and parity are ordinal variables (seems reasonable to me) then you could fit some sort of ordinal logistic regression. This makes fewer distributional assumptions than Poisson regression, but implies a particular model for how stillbirths increase by parity.

0

reply

X

Page 1 of 1

Go to first unread

Skip to page:

### Quick Reply

Back

to top

to top