The Student Room Group

Chi Square / Standardized Residuals - Help

Hi all, just wondering if anyone could help me..
Summary;
Columns - 9 insect orders, Rows - 4 sampling techniques
Chi square test - there was a significant difference...

I'm looking to do an additional test which helps to specify where the difference actual lies, and which results have little impact e.g. Post Hoc....

I understand that standardized residuals tell you how strong the difference between the expected/actual counts is, and that if 95% of values are within -3 to +3 then you have normally distributed data. But would you be able to use these values to suggest where the variation lies e.g. ones greater than -3 - +3 are insignificant?

Not too sure if I got the total wrong end of the stick, but any help would be great...


Josh
Original post by Josh989
Hi all, just wondering if anyone could help me..
Summary;
Columns - 9 insect orders, Rows - 4 sampling techniques
Chi square test - there was a significant difference...

I'm looking to do an additional test which helps to specify where the difference actual lies, and which results have little impact e.g. Post Hoc....


The simple way to approach this is to recall that the standardized residuals are asymptotically normally distributed with mean zero and variance one. So if the cell sizes are reasonably large, you can just look see where your residuals lie on a normal distribution and you can get a p-value (if you wish) using the z-test. Informally, if a standardized residual is larger than about 1.96, then it will give you a "significant" p-value at the 5% level considered as an individual statistical test.

However, with a 4 x 9 table, you will be inspecting quite a lot of cells for small p-values and it would be advisable to correct for that multiple comparison. There are a number of ways of doing that; I generally recommend the Benjamini-Hochberg procedure that controls for false discovery rate in multiple testing. If you are using statistical software, this adjustment should be available to you somewhere!

I understand that standardized residuals tell you how strong the difference between the expected/actual counts is, and that if 95% of values are within -3 to +3 then you have normally distributed data


These residuals do tell you which cells are contributing most to the value of chi-squared. However, "95% of values are within -3 to +3 then you have normally distributed data" most certainly does not tell you that you have normally distributed data. The normality result that you are using here is that as the cell sizes grow larger, the distribution of the standardized residuals gets closer and closer to that of the standard normal.

But would you be able to use these values to suggest where the variation lies e.g. ones greater than -3 - +3 are insignificant?


Agresti's "Categorical Data Analysis" (which is the standard handbook for this sort of thing) suggests: "A standardized residual that exceeds about 2 or 3 in absolute value indicates a lack of fit of the null hypothesis in that cell. Larger values are more relevant when the number of degrees of freedom is larger, as it becomes more likely that at least one such residual is large simply by chance".

Using the BH procedure I suggest above (if you want to be formal about it) will effectively give you the proper threshold of interest. The ones greater than this threshold are the ones of interest.

Standard Disclaimer: I haven't asked you about the nature of your data, or how it came to be...but you do need to meet certain requirements (for example, all cell sizes sufficiently large, no structural zeros...) for this stuff to work...

Quick Reply

Latest