# Statistics/competition question

This is a real life problem that I was thinking about and I was wondering what type of statistics can be used to answer it?

There are 100 athletes in a race and 10 of them are taking a drug which gives them a 10% advantage compared to all the other athletes. To model this, assume that apart from the 10% advantage, all the athletes are identical.

What is the probability that the athlete that wins the race took the drug?
Original post by 0-)
This is a real life problem that I was thinking about and I was wondering what type of statistics can be used to answer it?

There are 100 athletes in a race and 10 of them are taking a drug which gives them a 10% advantage compared to all the other athletes. To model this, assume that apart from the 10% advantage, all the athletes are identical.

What is the probability that the athlete that wins the race took the drug?

I haven’t got a solution to the problem yet, but I’d first define the probability that each athlete without taking the drug has of winning as p.

Use this, and the fact that the drug gives a competitor a 10% advantage to work out an expression in terms of p for the probability that each athlete who has had the drug has of winning.

You should also know what the sum of all the probabilities will equal.

How might you use this to find a value for p?

Edited for clarity because I realise how badly I worded it initially
(edited 10 months ago)
Original post by TypicalNerd
I haven’t got a solution to the problem yet, but I’d first define the probability that each athlete without taking the drug has of winning as p.

Use this, and the fact that the drug gives a competitor a 10% advantage to work out an expression in terms of p for the probability that each athlete who has had the drug has of winning.

You should also know what the sum of all the probabilities will equal.

How might you use this to find a value for p?

Edited for clarity because I realise how badly I worded it initially

Ah yes that's a good way to think about it.

So p = 1/101 and then the probability that an athlete who takes the drug wins the race is 11/1010.

So that makes the probability that one of the 10 drug taking athletes wins the race as 11/101? That would mean that there's around an 8.9% higher chance that one of these 10 athletes wins the race compared to if they didn't take the drug.

Is that all correct or have I made a mistake somewhere?
Original post by 0-)
Ah yes that's a good way to think about it.

So p = 1/101 and then the probability that an athlete who takes the drug wins the race is 11/1010.

So that makes the probability that one of the 10 drug taking athletes wins the race as 11/101? That would mean that there's around an 8.9% higher chance that one of these 10 athletes wins the race compared to if they didn't take the drug.

Is that all correct or have I made a mistake somewhere?

I do believe we agree on our final answers.

I’d perhaps consult the likes of @mqb2766 on this as he is much better with anything mathematical than me.
Original post by TypicalNerd
I do believe we agree on our final answers.

I’d perhaps consult the likes of @mqb2766 on this as he is much better with anything mathematical than me.

Thanks! I also wanted to think about what would happen for a sample of 100 athletes from a normal distribution where 10 of the athletes have a 10% advantage over the mean. I'll get my pen and paper out later and let you know if I need help
Original post by 0-)
This is a real life problem that I was thinking about and I was wondering what type of statistics can be used to answer it?

There are 100 athletes in a race and 10 of them are taking a drug which gives them a 10% advantage compared to all the other athletes. To model this, assume that apart from the 10% advantage, all the athletes are identical.

What is the probability that the athlete that wins the race took the drug?

"A 10% advantage" is very vague. In subsequent discussion you've taken it to mean "a 10% increase in the probability they win", but I don't think that's how most people would interpret it.

I'd say a more standard interpretation would mean they run (or cycle, etc) 10% faster. In which case if *any* athlete within 10% of the fastest (undrugged) times takes drugs, they will win. So the probability is 1 - p(all 10 drugged athletes were (when not drugged) outside 10% of the fastest natural time).

For most high-level races, the gap between the median runner and the fastest is less than 10% and so this probability is going to be over 99.9%.

Original post by DFranklin
"A 10% advantage" is very vague. In subsequent discussion you've taken it to mean "a 10% increase in the probability they win", but I don't think that's how most people would interpret it.

I'd say a more standard interpretation would mean they run (or cycle, etc) 10% faster. In which case if *any* athlete within 10% of the fastest (undrugged) times takes drugs, they will win. So the probability is 1 - p(all 10 drugged athletes were (when not drugged) outside 10% of the fastest natural time).

For most high-level races, the gap between the median runner and the fastest is less than 10% and so this probability is going to be over 99.9%.

That's interesting. I was originally thinking about men and women where there is around a 10% "performance gap" between elite athletes but of course that doesn't mean a 10% higher probability of winning a race.

If you take 10 of the best male sprinters and they compete against 90 of the best female sprinters then one of the males will obviously win. But if you take 10 random men and 90 random women then this gets more complicated and it may be harder to work out the chance of the men winning. In my head I'm picturing two intersecting normal distribution curves.
Original post by 0-)
That's interesting. I was originally thinking about men and women where there is around a 10% "performance gap" between elite athletes but of course that doesn't mean a 10% higher probability of winning a race.

If you take 10 of the best male sprinters and they compete against 90 of the best female sprinters then one of the males will obviously win. But if you take 10 random men and 90 random women then this gets more complicated and it may be harder to work out the chance of the men winning. In my head I'm picturing two intersecting normal distribution curves.

What do you think your distribution(s) represent? The probabiltiy of a person in group A (man/drugged/...) running a time or ...? If it was, its going to be a bit of a faff to take the product of n independent samples, find the distribution associated with the maximum value then compare the max values in the two different groups to determine the probability of a person from group A winning (compared to B). Should be doable but not trivial.

Along the same lines as dranklin, an individuals time would have a fairly narrow distribution as decent runners are fairly consistent and representing a group of runners as a single distribution from which youd imagine taking an iid sample and then form the joint and model the max .... would probably lose a lot. So if you know (are given) whos in the race, youre really looking at a few narrow distributions and working the probability of one winning, a bit like pog and vingegaard in the tdf at the moment.

Sometimes its worth setting up a problem, even if you know its not going to be "perfect" just to understand the limitations though. So as above, maybe be clear about what you mean by the 10% advantage (or think about a couple of ways) and post that.
(edited 10 months ago)