Finding significant co-occurrent pairs

1. Finding significant co-occurrent pairs
Hello.

I am doing a biological study for my masters dissertation and have been identifying species from an archaeological site.
I am trying to do a statistical test which will be able identify an groups of taxa that occur together.
I have 16 different samples and 45 different taxa. In some sampless taxa doesn't occur at all.

Apparently I can use Spearmans rank to identify any groupings within the site across the different samples. Such as species A occurs with species B etc.

Any ideas or help would be fantastic. I have no idea what I am doing at the moment
2. Re: Finding significant co-occurrent pairs
I have thought about using Simpons Index to study the diversity in each sample across the site. I assume this would work well...

I now need to test the correlations between certain species. I assume that if you had a sample indicating wet and a sample indicating dry that they would have a low correlation but if they both indicated wet then you expect this to be larger. I did a Jaccard correlation across the samples but the results were extremely low due to the occurrence of rare species.
3. Re: Finding significant co-occurrent pairs
Let me repeat that back to make sure I've got this right.

There are 45 different taxa in total. Whenever you take a take a sample, you find that some subset of these taxa are present. There are 16 samples in total. So your data looks something like this: Sample1 = {A, F, G, H}, Sample2 = {A, B, D, F}, etc, where A,B,C,... are the names of the taxa.

Is that right?

First question: is the presence of a certain taxa a binary event (ie it is either present or not present), or do you also have a mark such as the number or intensity of the taxa present that you are interested in?

Second question: what is the actual problem you are trying to solve? Are you trying to find some kind of predictive mode where you can talk about (eg) the probability of taxa A being present give that taxa B,C,D are present? Or are you just trying to find general groups that occur together without needing to specifc a full statistical/predictive model?

If you dont need a statistical/predictive model then this may be what you want: http://en.wikipedia.org/wiki/Association_rule_learning
Last edited by poohat; 01-08-2012 at 03:00.
4. Re: Finding significant co-occurrent pairs
For what its worth, my initial feelings are that you are going to have to think about how the samples were obtained and whether you can treat them as being structurally similar in some sense.

Suppose there are two species A and B, and two different countries X and Y. In country X both species are very common and you will always find both together. However only species A lives in country Y, and you will never find them together in that country. If your samples were drawn uniformly at random from both countries, then you should find that (asymptotically) the probability of the two species occurring together is 0.5, since you will have roughly half your samples from country X where they are always together, and roughly half from country Y where they are never together.

But now suppose your sampling procedure is biased so that most of your samples come from country X (perhaps archaeologists are more likely to dig in this country). In this case you will find the probability of the species occuring together is substantially greater than 0.5. Similarly if your sampling procedure is biased so that most samples come from country Y, you will get lower than 0.5. So the probability you estimate depends on the sampling mechanism, which can introduce bias.

Ideally you want to be able to say that all the archaelogical sites are in some sense identical, so that you can treat your samples as being independent draws from the same population. In this case there are no problems and the analysis will be easy. But if you have certain known structural features which make some sites different from others (such as being in different countries, different types of habitat, etc) then it becomes more difficult and to do a full analysis you might need to start introducing these variables into your model
Last edited by poohat; 01-08-2012 at 03:03.
5. Re: Finding significant co-occurrent pairs
"" There are 16 samples in total. So your data looks something like this: Sample1 = {A, F, G, H}, Sample2 = {A, B, D, F}, etc, where A,B,C,... are the names of the taxa. ""

Yes this is correct. I have abundance and also presence/absence data. I think it would be useful to use presence/absence rather than abundance.

I am reconstructing the environment using the species from the different samples. I am trying to show that certain features at different positions across the site show the same group of taxa.

thankyou for your reply. Im going to look in to association rules now
6. Re: Finding significant co-occurrent pairs
I understood the associations rule but im not sure how to calculate it
