# further maths statistics

Watch
Announcements
#1
I dont get why the df is 2. I think pooling of the last two tables makes it 4 columns but the ans book says v-2.

I am not sure why its -2.
0
#2
0
1 month ago
#3
You lose two degrees of freedom because the totals are the same and the mean has been calculated.
It does help to upload the full question so we know the context is a chi square test.
Last edited by mqb2766; 1 month ago
0
#4
(Original post by mqb2766)
You lose two degrees of freedom because the totals are the same and the mean has been calculated.
It does help to upload the full question so we know the context is a chi square test.
I cant see that means have been calculated. Where? Here are both parts.
0
#5
(Original post by sonal7)
I cant see that means have been calculated. Where? Here are both parts.
..
0
1 month ago
#6
(Original post by sonal7)
..
"she calculates the actual proportion"

I had to search for the other part to understand the OP.
0
#7
(Original post by mqb2766)
"she calculates the actual proportion"

I had to search for the other part to understand the OP.
so why 2 df? why 4-2? Wheres the mean? thank you. I appreciate your time.
Last edited by sonal7; 1 month ago
0
1 month ago
#8
(Original post by sonal7)
so why 2 df? why 4-2? Wheres the mean? thank you. I appreciate your time.
I dont understand? You lose one dof because the totals are equal and the other dof because the means are also the same as the binomial p used to generate the expected frequency has been calculated from the observed frequency. Its similar to the (normal) unbiased variance calculation where you divide by n-1 rather than n, because the mean is estimated/calculated from the data hence you lose one dof.
Last edited by mqb2766; 1 month ago
0
#9
(Original post by mqb2766)
I dont understand? You lose one dof because the totals are equal and the other dof because the means are also the same as the binomial p used to generate the expected frequency has been calculated from the observed frequency. Its similar to the (normal) unbiased variance calculation where you divide by n-1 rather than n, because the mean is estimated/calculated from the data hence you lose one dof.
thank you - I am a bit confused about part b). how did she get the Ei (expected frequencies) if not from binomial?
0
1 month ago
#10
(Original post by sonal7)
thank you - I am a bit confused about part b). how did she get the Ei (expected frequencies) if not from binomial?
Ive not gone through the question carefully, but in part a) the expected frequency is generated from a binomial with p=0.05. Its not dependent on the observed data so you lose one dof when comparing the frequencies as the totals are the same, but the means (p) will be independent. As there seem to be 150, the expected value would be ~0.75 and that seems about right from the expected frequencies

In b) the expected frequencies are again generated from a binomial, but the p is calculated from the observed data, so the expected value is about 1.1. So there is a closer match with the observed data, and hence you lose one (extra) degree of freedom when comparing because the p is calculated. Its not independent.

Edit - For both question parts, why not generate the expected frequencies yourself to understand them?
Last edited by mqb2766; 1 month ago
0
#11
(Original post by mqb2766)
Ive not gone through the question carefully, but in part a) the expected frequency is generated from a binomial with p=0.05. Its not dependent on the observed data so you lose one dof when comparing the frequencies as the totals are the same, but the means (p) will be independent. As there seem to be 150, the expected value would be ~0.75 and that seems about right from the expected frequencies

In b) the expected frequencies are again generated from a binomial, but the p is calculated from the observed data, so the expected value is about 1.1. So there is a closer match with the observed data, and hence you lose one (extra) degree of freedom when comparing because the p is calculated. Its not independent.

Edit - For both question parts, why not generate the expected frequencies yourself to understand them?
I dont get this. Why would knowing the p value reduce your dof. I need a tutor.
0
1 month ago
#12
(Original post by sonal7)
I dont get this. Why would knowing the p value reduce your dof. I need a tutor.
Because the 4 probabilities cells in the observed and expected frequencies are not independent.
They are similar because
* They both sum to 150
* They both have the same mean (p)
So really, there are only two degrees of freedom difference between the four cells.

Ill try and dig out a tutorial, but what does your textbook say?
0
#13
(Original post by mqb2766)
Because the 4 probabilities cells in the observed and expected frequencies are not independent.
They are similar because
* They both sum to 150
* They both have the same mean (p)
So really, there are only two degrees of freedom difference between the four cells.

Ill try and dig out a tutorial, but what does your textbook say?
why is mean p? I thought mean is np. as n is the same then the mean must be the same. Thanks for trying to help. How does the mean help you work out the missing values. You mean once you have worked out two values in the E row then you can work out the mean( like reverse engineering)
Last edited by sonal7; 1 month ago
0
1 month ago
#14
(Original post by sonal7)
why is mean p? I thought mean is np. as n is the same then the mean must be the same. Thanks for trying to help. How does the mean help you work out the missing values. You mean once you have worked out two values in the E row then you can work out the mean( like reverse engineerin
Sure, the mean is np. n is the same so saying p is the same in both distribution or the mean is the same is equivalent.

Ok, lets try and give a illustrative example and I was looking for a good tutorial but have been out. Will do so later

Lets say there are 4 cells each for the observed and expected and youre using a chi squared to compare them. This is obviously equivalent to this problem. When the numbers in each cells are arbitrary counts and you compare using chi square, there is 4 degrees of freedom different between the observed and expected as the expected numbers can be anything. You have 4 arbitrary numbers/cells so 4 dof.

Now lets assume both are frequencies so they sum to 100 (or whatever). Then for both the observed/expected, the last cell is
100-sum of previous 3 cells
So the last cell is not independent and is determined by the other 3 cells, so there are 4-1=3 dof in the chi square test. Obviously were assuming the sum of the previous 3 cells is <= 100. We'd expect the observed / expected to be a closer match because of the sum to 100 constraint so the reduced dof compensates for this.

In practice, we don't set data like this, rather simply require the cells to sum to 100, but there are still 3 dof. In part a) where the observations are tested against the expected generated by a (normalized) binomial with an arbitrary p, we'll have 3 dof. In a sense, the missing dof is shared out across each cell.

Now for part b), the expected data is again generated by a (normalized) binomial (n,p) so the data sums to 100, but the p is set from the observed data. Again we'd expect this to be a closer match to the observed data as both the observed and expected data sum to 100 and have the same mean (or p). This matching of the observed and expected data removes an extra dof from the chi square test.

OK, so why is it one extra parameter when you match the p? Imagine you have two cells so something like
O: 40 60
E: ? ?
If we require the Expected to sum to 100, we'd have something like:
O: 40 60
E: 30 100-30
Now matching the mean (or p=0.4) with a binomial, we'd have an exact match
O: 40 60
E: 40 60
Originally there were 2 dof between the observed/expected. Frequency sum means there is really just one difference (repeated in the 2nd column) so dof=1. Matching p means that the rows are identical and there are 2-2=0 dof difference. There is no difference between the two rows simply because we are using frequency data with a known/estimated p.

With three cells you have 3 dof. Using frequency means you have 3-1=2 dof. Matching the mean or p, means you can match two of the columns but not the third, hence 3-2=1 dof. Again, this 1 dof is shared out across the 3 cells, rather than having two columns matching and the 3rd being independent.

Note in part a) you're testing the observations against a 5% binomial distribution. In part b) you're simply testing to see whether the data follows a binomial distribution, where the p is estimated from observed data. They're different tests. Obviously, the second one must give a better (not worse) match to the observed data, and reducing the dof by 1 compensates for this in the test.

This guy does something similar
Last edited by mqb2766; 1 month ago
0
#15
i will look into later as there is loads of other stuff i need to learn. I think this is making sense. I will reply later, maybe even in 2 weeks.
0
X

new posts
Back
to top
Latest
My Feed

### Oops, nobody has postedin the last few hours.

Why not re-start the conversation?

see more

### See more of what you like onThe Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

### Poll

Join the discussion

#### Do you think receiving Teacher Assessed Grades will impact your future?

I'm worried it will negatively impact me getting into university/college (74)
37.56%
I'm worried that I’m not academically prepared for the next stage in my educational journey (20)
10.15%
I'm worried it will impact my future career (13)
6.6%
I'm worried that my grades will be seen as ‘lesser’ because I didn’t take exams (49)
24.87%
I don’t think that receiving these grades will impact my future (25)
12.69%
I think that receiving these grades will affect me in another way (let us know in the discussion!) (16)
8.12%