Which statistical test should I choose?

Hey, I'd really appreciate some help please on which test/tests I should use for my dissertation. I'm horrendous at maths and statistics and am really struggling to choose which is best.

In short, my dissertation is focused on observing cats with and without enrichment, and assessing whether the enrichment helps to reduce stress. All the cats are observed twice (with and without enrichment) rather than using two separate groups.

At 5 minute intervals, I recorded the behaviour the cat was performing based on a pre-written list, and each behaviour is placed into one of two categories (stressed or relaxed). At the same time, I recorded how stressed the cat appeared based on a scale of 1-7. In the enrichment category, I also recorded when and how long the cat was interacting with it.

I was advised to carry out chi-squared - but I'm not sure if it fits the data? I've also considered a paired t-test, Wilcoxon or Kruskal-Wallis.

Thank you!
10 months ago
Really depends on the distributions of the data
https://www.nki.nl/media/837516/m343.pdf
is a decent overview.
10 months ago
The paper that @mqb2766 links to has some very good advice; but yours is a tricky little problem, so let’s walk you through it.

First let me check that I have understood exactly what you’re doing. You have a group of cats, and:

(i) You measure the behaviour of each cat both before and after some sort of “treatment”; you actually take several “measurements” (your pre-written list), each of which has a binary stressed/not-stressed outcome, before and after this “treatment”.

(ii) For each cat, pre- and post-treatment, you make a single assessment of “how stressed” they appear on a scale of 1-7.

(iii) For each cat, after they have received the “treatment”, you record how long they engaged with the treatment.

There are two tricky aspects to this situation: first the data are paired, and therefore not independent of each other; second, there are multiple measurements on each cat (the list) that may or may not be independent of each other. So a question immediately back to you: do you group all the different measurements on the list to come out with a single “stressed/not-stressed” measure for each cat? Or do you consider each of the behaviours separately?

(i) So for the first item on my list above, if you simply had a single stressed/not-stressed measure on each cat before and after (or during) treatment, then you have paired binary data. The appropriate test for this is the McNemar test (which, as it turns out, is a variety of chi-squared test). If you do combine all the behaviours into a single stressed/not-stressed measure, you can stop there for this analysis.

If you consider each of the behaviours separately, then you could (a) to a McNemar test on each of the behaviours separately and adjust the p-values you get for multiple testing (you would do this if you wanted to identify particular aspects of behaviour separately), or (b) consider the different behaviours to be strata, and apply the Cochran-Mantel-Haenszel test.

(ii) Here you have a before/after design with an ordinal outcome. You should be safe using the Wilcoxon signed rank test here.

(iii) I presume the question here would be along the lines of “is the length of engagement with the treatment associated with stressed/not-stressed outcome”. Here you can use logistic regression: the outcome is whether or not they were stressed after treatment, with two covariates, the length of time of engagement and whether or not they were stressed before treatment (the latter so that you’re adjusting for their baseline behaviour). You have the same question here as for (i): are you treating the different behaviours separately or are you lumping them together in one measure of stress? If the former, you options here are more limited: adjust for multiple testing.
