The Student Room Group

Best algorithm

I have a graph with two sets of p-values.
The significance level is p=0.05.
The regions of interest are were ONE value is significant and the other is not.
(the red region = excluded region).

I want to find an algorithm or equation that will give me the point on my graph that is the most significant p-value on one axis and the least significant on the other for all the data points

I have tried measuring the distance to the origin BUT this method does not work when one value is p=0.049 and the other is p=0.999

(the set p=0.049, p=0.999 is clearly less ideal than p=0.025, p=0.7 but using distance to origin, where the origin is p=0.05, p=0.05, it appears to be better)

Grapg us attached
Thanks
Original post by jsmith6131

I want to find an algorithm or equation that will give me the point on my graph that is the most significant p-value on one axis and the least significant on the other for all the data points


As stated, for general sets of pairs of p-values, this problem doesn't have a solution. That is, there need not be such a point. The set {(0.01, 0.99), (0.001, 0.8)} being a counter example - the second point has the "most significant" first p-value, the first point has the "least significant" second p-value; there is no point for which both conditions are simultaneously true.

Perhaps you meant something different?
Original post by Gregorius
As stated, for general sets of pairs of p-values, this problem doesn't have a solution. That is, there need not be such a point. The set {(0.01, 0.99), (0.001, 0.8)} being a counter example - the second point has the "most significant" first p-value, the first point has the "least significant" second p-value; there is no point for which both conditions are simultaneously true.

Perhaps you meant something different?


I want to find the solution that gives the best combination of high p-value for one and small p-value for the other.
Does that make it clearer?
Original post by jsmith6131
I want to find the solution that gives the best combination of high p-value for one and small p-value for the other.
Does that make it clearer?


The problem is that "best combination" is not defined. Presumably what is considered to be so will come from the context - care to elaborate?
Original post by Gregorius
The problem is that "best combination" is not defined. Presumably what is considered to be so will come from the context - care to elaborate?


Well I am measuring how a number of different parameters change in two population groups. I am trying to work out which parameters best describe the difference in the two populations.

I have p-values for each parameter in the two populations

I want to find the parameters where the p-values are most significant in one group and least significant in the other, at the same time.

Does that make sense?
Original post by jsmith6131
Well I am measuring how a number of different parameters change in two population groups. I am trying to work out which parameters best describe the difference in the two populations.

I have p-values for each parameter in the two populations

I want to find the parameters where the p-values are most significant in one group and least significant in the other, at the same time.

Does that make sense?


So is the situation something like the following?

You have two populations X and Y from which you have drawn random samples A and B respectively. A set of n parameters are measured in both A and B; some sort of intervention is applied to both A and B and the n parameters are measured again in A and B. This results in measurements x1,1A,x2,1A,,xn,1Ax_{1,1}^A, x_{2,1}^A, \cdots, x_{n,1}^A and x1,2A,x2,2A,,xn,2Ax_{1,2}^A, x_{2,2}^A, \cdots, x_{n,2}^A made in A before and after the intervention and x1,1B,x2,1B,,xn,1Bx_{1,1}^B, x_{2,1}^B, \cdots, x_{n,1}^B and x1,2B,x2,2B,,xn,2Bx_{1,2}^B, x_{2,2}^B, \cdots, x_{n,2}^B made in B before and after the intervention. The effect of the intervention is measured in A and B separately by calculating p-values for the differences xi,2Axi,1Ax_{i,2}^A - x_{i, 1}^A and xi,2Bxi,1Bx_{i,2}^B - x_{i, 1}^B. This gives you p-values piA p_{i}^A and piB p_{i}^B. You now wish to find which of the parameters x1,x2,,xnx_{1}, x_{2}, \cdots, x_{n} is "most significantly" changed in A whilst being "least significantly" changed in B and vice-versa.

Am I close?
Original post by Gregorius
So is the situation something like the following?

You have two populations X and Y from which you have drawn random samples A and B respectively. A set of n parameters are measured in both A and B; some sort of intervention is applied to both A and B and the n parameters are measured again in A and B. This results in measurements x1,1A,x2,1A,,xn,1Ax_{1,1}^A, x_{2,1}^A, \cdots, x_{n,1}^A and x1,2A,x2,2A,,xn,2Ax_{1,2}^A, x_{2,2}^A, \cdots, x_{n,2}^A made in A before and after the intervention and x1,1B,x2,1B,,xn,1Bx_{1,1}^B, x_{2,1}^B, \cdots, x_{n,1}^B and x1,2B,x2,2B,,xn,2Bx_{1,2}^B, x_{2,2}^B, \cdots, x_{n,2}^B made in B before and after the intervention. The effect of the intervention is measured in A and B separately by calculating p-values for the differences xi,2Axi,1Ax_{i,2}^A - x_{i, 1}^A and xi,2Bxi,1Bx_{i,2}^B - x_{i, 1}^B. This gives you p-values piA p_{i}^A and piB p_{i}^B. You now wish to find which of the parameters x1,x2,,xnx_{1}, x_{2}, \cdots, x_{n} is "most significantly" changed in A whilst being "least significantly" changed in B and vice-versa.

Am I close?


Thank you
Yes. THats basically what is going on.
I tried to find a statistical test that would perform this analysis but I don't think there is one that compares a list of n un-related parameters in populations A and B.
Original post by jsmith6131
Thank you
Yes. THats basically what is going on.
I tried to find a statistical test that would perform this analysis but I don't think there is one that compares a list of n un-related parameters in populations A and B.


OK, I'll try and get back to you this evening after I've had a think.
Original post by jsmith6131
Thank you
Yes. THats basically what is going on.
I tried to find a statistical test that would perform this analysis but I don't think there is one that compares a list of n un-related parameters in populations A and B.


Yes, this is a tricky one. The first thing that comes to my mind is some sort of MANOVA or MANCOVA, but I can't immediately see how to bend it into the shape that you want. I'll carry on thinking along these lines, but for the moment...

Perhaps a reasonable approach is to go back, not to p-values, but to normalized effect sizes. There is a ono-to-one correspondence between them, but the scale of effect size would seem to make more sense than the non-linear transform that you go through to get a p-value.

So, consider one of the parameters xix_i that you're interested in and consider its values before and after the intervention in samples A and B. In each of these samples calculate the effect size (after-before) and divide it by its standard deviation to get θiA\theta_{i}^A and θiB\theta_{i}^B. Then to get a comparison between θiA\theta_{i}^A and θiB\theta_{i}^B either subtract one from the other, or take a ratio or something like that. Then order the differences (or the ratios).

This approach has the advantage of simplicity; but it has the disadvantage that there is no obvious analytic way of coming up with a statistical way of telling whether one parameter is really more differentiating between the effect in A and B than another. It can be done, using a computationally intensive technique called the bootstrap, but this would require a bit of non-trivial statistical programming.
Original post by Gregorius
Yes, this is a tricky one. The first thing that comes to my mind is some sort of MANOVA or MANCOVA, but I can't immediately see how to bend it into the shape that you want. I'll carry on thinking along these lines, but for the moment...

Perhaps a reasonable approach is to go back, not to p-values, but to normalized effect sizes. There is a ono-to-one correspondence between them, but the scale of effect size would seem to make more sense than the non-linear transform that you go through to get a p-value.

So, consider one of the parameters xix_i that you're interested in and consider its values before and after the intervention in samples A and B. In each of these samples calculate the effect size (after-before) and divide it by its standard deviation to get θiA\theta_{i}^A and θiB\theta_{i}^B. Then to get a comparison between θiA\theta_{i}^A and θiB\theta_{i}^B either subtract one from the other, or take a ratio or something like that. Then order the differences (or the ratios).

This approach has the advantage of simplicity; but it has the disadvantage that there is no obvious analytic way of coming up with a statistical way of telling whether one parameter is really more differentiating between the effect in A and B than another. It can be done, using a computationally intensive technique called the bootstrap, but this would require a bit of non-trivial statistical programming.


Thank you for this. As statistically significance is important though I don't think I can use this method.

I have tried measuring a number of different ratios between the p-values (e.g: p(A)/p(B), (p(A)-p(B))/p(A)+p(B)...and thankfully my dataset is small enough that I can tell my approaches are obviously wrong.

If you think there is no "best" solution then I suppose I can present the "best" of the possible solutions I have come up with as at least I will have something!

But thanks for taking time to help me with my issue. I really appreciate it
Original post by Gregorius
Yes, this is a tricky one. The first thing that comes to my mind is some sort of MANOVA or MANCOVA, but I can't immediately see how to bend it into the shape that you want. I'll carry on thinking along these lines, but for the moment...

Perhaps a reasonable approach is to go back, not to p-values, but to normalized effect sizes. There is a ono-to-one correspondence between them, but the scale of effect size would seem to make more sense than the non-linear transform that you go through to get a p-value.

So, consider one of the parameters xix_i that you're interested in and consider its values before and after the intervention in samples A and B. In each of these samples calculate the effect size (after-before) and divide it by its standard deviation to get θiA\theta_{i}^A and θiB\theta_{i}^B. Then to get a comparison between θiA\theta_{i}^A and θiB\theta_{i}^B either subtract one from the other, or take a ratio or something like that. Then order the differences (or the ratios).

This approach has the advantage of simplicity; but it has the disadvantage that there is no obvious analytic way of coming up with a statistical way of telling whether one parameter is really more differentiating between the effect in A and B than another. It can be done, using a computationally intensive technique called the bootstrap, but this would require a bit of non-trivial statistical programming.



I came up with a method actually and was wondering if you agree.
Let's asume A = p<0.05 ALWAYS
and B = p>0.05 ALWAYS

If we say
Score = 1/A * B
that seems to produce a very good overall score

Do you agree?
thanks
Original post by jsmith6131
I came up with a method actually and was wondering if you agree.
Let's asume A = p<0.05 ALWAYS
and B = p>0.05 ALWAYS

If we say
Score = 1/A * B
that seems to produce a very good overall score

Do you agree?
thanks


I think that this corresponds to the suggestion I made about taking the ratio of the effect sizes - but doing it on the scale of the p-values.
Original post by Gregorius
I think that this corresponds to the suggestion I made about taking the ratio of the effect sizes - but doing it on the scale of the p-values.


Yes, That is how I came up with the idea :smile: thanks.

But I felt the method had to be done on the p-values not the original values to ensure only paramaters that demonstrated a significant change in one of the two populations (A or B) were included. If I took values that significantly changed in both populations then that parameter would not actually be capable of distinguishing the two groups

But thanks for confirming :smile:
(edited 8 years ago)

Quick Reply

Latest