# further maths statistics

Watch
Announcements

Page 1 of 1

Go to first unread

Skip to page:

I dont get why the df is 2. I think pooling of the last two tables makes it 4 columns but the ans book says v-2.

I am not sure why its -2.

I am not sure why its -2.

0

reply

Report

#3

It does help to upload the full question so we know the context is a chi square test.

Last edited by mqb2766; 1 month ago

0

reply

(Original post by

You lose two degrees of freedom because the totals are the same and the mean has been calculated.

It does help to upload the full question so we know the context is a chi square test.

**mqb2766**)You lose two degrees of freedom because the totals are the same and the mean has been calculated.

It does help to upload the full question so we know the context is a chi square test.

0

reply

(Original post by

I cant see that means have been calculated. Where? Here are both parts.

**sonal7**)I cant see that means have been calculated. Where? Here are both parts.

0

reply

Report

#6

(Original post by

..

**sonal7**)..

I had to search for the other part to understand the OP.

0

reply

(Original post by

"she calculates the actual proportion"

I had to search for the other part to understand the OP.

**mqb2766**)"she calculates the actual proportion"

I had to search for the other part to understand the OP.

Last edited by sonal7; 1 month ago

0

reply

Report

#8

(Original post by

so why 2 df? why 4-2? Wheres the mean? thank you. I appreciate your time.

**sonal7**)so why 2 df? why 4-2? Wheres the mean? thank you. I appreciate your time.

Last edited by mqb2766; 1 month ago

0

reply

(Original post by

I dont understand? You lose one dof because the totals are equal and the other dof because the means are also the same as the binomial p used to generate the expected frequency has been calculated from the observed frequency. Its similar to the (normal) unbiased variance calculation where you divide by n-1 rather than n, because the mean is estimated/calculated from the data hence you lose one dof.

**mqb2766**)I dont understand? You lose one dof because the totals are equal and the other dof because the means are also the same as the binomial p used to generate the expected frequency has been calculated from the observed frequency. Its similar to the (normal) unbiased variance calculation where you divide by n-1 rather than n, because the mean is estimated/calculated from the data hence you lose one dof.

0

reply

Report

#10

(Original post by

thank you - I am a bit confused about part b). how did she get the Ei (expected frequencies) if not from binomial?

**sonal7**)thank you - I am a bit confused about part b). how did she get the Ei (expected frequencies) if not from binomial?

In b) the expected frequencies are again generated from a binomial, but the p is calculated from the observed data, so the expected value is about 1.1. So there is a closer match with the observed data, and hence you lose one (extra) degree of freedom when comparing because the p is calculated. Its not independent.

Edit - For both question parts, why not generate the expected frequencies yourself to understand them?

Last edited by mqb2766; 1 month ago

0

reply

(Original post by

Ive not gone through the question carefully, but in part a) the expected frequency is generated from a binomial with p=0.05. Its not dependent on the observed data so you lose one dof when comparing the frequencies as the totals are the same, but the means (p) will be independent. As there seem to be 150, the expected value would be ~0.75 and that seems about right from the expected frequencies

In b) the expected frequencies are again generated from a binomial, but the p is calculated from the observed data, so the expected value is about 1.1. So there is a closer match with the observed data, and hence you lose one (extra) degree of freedom when comparing because the p is calculated. Its not independent.

Edit - For both question parts, why not generate the expected frequencies yourself to understand them?

**mqb2766**)Ive not gone through the question carefully, but in part a) the expected frequency is generated from a binomial with p=0.05. Its not dependent on the observed data so you lose one dof when comparing the frequencies as the totals are the same, but the means (p) will be independent. As there seem to be 150, the expected value would be ~0.75 and that seems about right from the expected frequencies

In b) the expected frequencies are again generated from a binomial, but the p is calculated from the observed data, so the expected value is about 1.1. So there is a closer match with the observed data, and hence you lose one (extra) degree of freedom when comparing because the p is calculated. Its not independent.

Edit - For both question parts, why not generate the expected frequencies yourself to understand them?

0

reply

Report

#12

(Original post by

I dont get this. Why would knowing the p value reduce your dof. I need a tutor.

**sonal7**)I dont get this. Why would knowing the p value reduce your dof. I need a tutor.

They are similar because

* They both sum to 150

* They both have the same mean (p)

So really, there are only two degrees of freedom difference between the four cells.

Ill try and dig out a tutorial, but what does your textbook say?

0

reply

(Original post by

Because the 4 probabilities cells in the observed and expected frequencies are not independent.

They are similar because

* They both sum to 150

* They both have the same mean (p)

So really, there are only two degrees of freedom difference between the four cells.

Ill try and dig out a tutorial, but what does your textbook say?

**mqb2766**)Because the 4 probabilities cells in the observed and expected frequencies are not independent.

They are similar because

* They both sum to 150

* They both have the same mean (p)

So really, there are only two degrees of freedom difference between the four cells.

Ill try and dig out a tutorial, but what does your textbook say?

Last edited by sonal7; 1 month ago

0

reply

Report

#14

(Original post by

why is mean p? I thought mean is np. as n is the same then the mean must be the same. Thanks for trying to help. How does the mean help you work out the missing values. You mean once you have worked out two values in the E row then you can work out the mean( like reverse engineerin

**sonal7**)why is mean p? I thought mean is np. as n is the same then the mean must be the same. Thanks for trying to help. How does the mean help you work out the missing values. You mean once you have worked out two values in the E row then you can work out the mean( like reverse engineerin

Ok, lets try and give a illustrative example and I was looking for a good tutorial but have been out. Will do so later

Lets say there are 4 cells each for the observed and expected and youre using a chi squared to compare them. This is obviously equivalent to this problem. When the numbers in each cells are arbitrary counts and you compare using chi square, there is 4 degrees of freedom different between the observed and expected as the expected numbers can be anything. You have 4 arbitrary numbers/cells so 4 dof.

Now lets assume both are frequencies so they sum to 100 (or whatever). Then for both the observed/expected, the last cell is

100-sum of previous 3 cells

So the last cell is not independent and is determined by the other 3 cells, so there are 4-1=3 dof in the chi square test. Obviously were assuming the sum of the previous 3 cells is <= 100. We'd expect the observed / expected to be a closer match because of the sum to 100 constraint so the reduced dof compensates for this.

In practice, we don't set data like this, rather simply require the cells to sum to 100, but there are still 3 dof. In part a) where the observations are tested against the expected generated by a (normalized) binomial with an arbitrary p, we'll have 3 dof. In a sense, the missing dof is shared out across each cell.

Now for part b), the expected data is again generated by a (normalized) binomial (n,p) so the data sums to 100, but the p is set from the observed data. Again we'd expect this to be a closer match to the observed data as both the observed and expected data sum to 100 and have the same mean (or p). This matching of the observed and expected data removes an extra dof from the chi square test.

OK, so why is it one extra parameter when you match the p? Imagine you have two cells so something like

O: 40 60

E: ? ?

If we require the Expected to sum to 100, we'd have something like:

O: 40 60

E: 30 100-30

Now matching the mean (or p=0.4) with a binomial, we'd have an exact match

O: 40 60

E: 40 60

Originally there were 2 dof between the observed/expected. Frequency sum means there is really just one difference (repeated in the 2nd column) so dof=1. Matching p means that the rows are identical and there are 2-2=0 dof difference. There is no difference between the two rows simply because we are using frequency data with a known/estimated p.

With three cells you have 3 dof. Using frequency means you have 3-1=2 dof. Matching the mean or p, means you can match two of the columns but not the third, hence 3-2=1 dof. Again, this 1 dof is shared out across the 3 cells, rather than having two columns matching and the 3rd being independent.

Note in part a) you're testing the observations against a 5% binomial distribution. In part b) you're simply testing to see whether the data follows a binomial distribution, where the p is estimated from observed data. They're different tests. Obviously, the second one must give a better (not worse) match to the observed data, and reducing the dof by 1 compensates for this in the test.

This guy does something similar

https://www.youtube.com/watch?v=O7wy...l=jbstatistics

Last edited by mqb2766; 1 month ago

0

reply

i will look into later as there is loads of other stuff i need to learn. I think this is making sense. I will reply later, maybe even in 2 weeks.

0

reply

X

Page 1 of 1

Go to first unread

Skip to page:

### Quick Reply

Back

to top

to top