# Statistics - Hypothesis testingWatch

#1
doing this question from AQA Statistics 2 January 2007 paper:

i'm confused as to why they've used s as an estimate of σ instead of just estimating σ from the sample? in my book it says to use the sample to estimate σ if n>= 30 and if not then use s as an estimate of σ but in this question it uses s even though n = 100? it makes a difference as the value is so close to the z critical value and tbh idk why they've done it in the first place?

Last edited by Odium; 4 months ago
0
4 months ago
#2
s is not the sample standard deviation?
0
#3
(Original post by yudothis)
s is not the sample standard deviation?
ik but my book says estimate σ from sample if its is greater than 30 but here they've calculated s instead so im a bit confused lol
0
4 months ago
#4
(Original post by Odium)
i'm confused as to why they've used s as an estimate of σ instead of just estimating σ from the sample? in my book it says to use the sample to estimate σ if n>= 30 and if not then use s as an estimate of σ but in this question it uses s even though n = 100? it makes a difference as the value is so close to the z critical value and tbh idk why they've done it in the first place?
What do you mean by the bold above? What would be your σ?

Also are you studying for new A Level maths and are you reading a textbook designed for the new spec?
0
4 months ago
#5
(Original post by Odium)
ik but my book says estimate σ from sample if its is greater than 30 but here they've calculated s instead so im a bit confused lol
So if you estimate sigma from the sample, that means you use the sample standard deviation. They have done what they said to do. Look at the formula, they use the sample mean, they divide by n-1 -> they calculated the sample mean. So this: "my book it says to use the sample to estimate σ if n>= 30" is what they did.

Now what they mean by "if n<30 use s" makes no sense to me.
0
#6
(Original post by Notnek)
What do you mean by the bold above? What would be your σ?

Also are you studying for new A Level maths and are you reading a textbook designed for the new spec?
(Original post by yudothis)
So if you estimate sigma from the sample, that means you use the sample standard deviation. They have done what they said to do. Look at the formula, they use the sample mean, they divide by n-1 -> they calculated the sample mean. So this: "my book it says to use the sample to estimate σ if n>= 30" is what they did.

Now what they mean by "if n<30 use s" makes no sense to me.
i'm resitting from last year but i've been using a few different resources (some of which are new spec) but this bit is from the aqa book, but i think i've misunderstood what's going on

from this flowchart i just assumed that i'd work out an estimate for sample standard deviation by using the normal way (dividing by n instead of n-1) since it says 'use the sample to estimate to estimate sd' instead of what it says for if n<30, which says use s as an estimate of sd

edit: at the start of the chapter it says using n-1 or n shouldn't make too much of a difference if the sample is large but it's correct to use n-1... so do i always use n-1 when estimating sd?
Last edited by Odium; 4 months ago
0
4 months ago
#7
(Original post by Odium)
i'm resitting from last year but i've been using a few different resources (some of which are new spec) but this bit is from the aqa book, but i think i've misunderstood what's going on

from this flowchart i just assumed that i'd work out an estimate for sample standard deviation by using the normal way (dividing by n instead of n-1) since it says 'use the sample to estimate to estimate sd' instead of what it says for if n<30, which says use s as an estimate of sd
I don't know much about old spec stats and so these notes are confusing to me. I don't know what the difference is between "Use the sample to estimate sigma" and "Use s as an estimate of sigma". Are you saying that you just divide by n instead of n-1?

Tagging Gregorius who knows a lot more than me and should hopefully be able to help.

By the way, I don't recommend using new spec resources since stats has changed loads and you may get confused (like I am when looking at old spec stats ).
Last edited by Notnek; 4 months ago
0
#8
(Original post by Notnek)
I don't know much about old spec stats and so these notes are confusing to me. I don't know what the difference is between "Use the sample to estimate sigma" and "Use s as an estimate of sigma". Are you saying that you just divide by n instead of n+1?

Tagging Gregorius who knows a lot more than me and should hopefully be able to help.

By the way, I don't recommend using new spec resources since stats has changed loads and you may get confused (like I am when looking at old spec stats ).
yeah i've been dividing by n instead of n-1 but that's incorrect right? it seems like they mean the same thing i think??? (calculate s) but i just mistakenly thought it didn't because they used different wording

okay!! i didn't know there was a huge difference (as i haven't really looked at the new specifiation) but i'll stick to old spec stuff! thanks
0
4 months ago
#9
(Original post by Notnek)
Tagging Gregorius who knows a lot more than me and should hopefully be able to help.
Ha! This train-wreck of a question illustrates why the way statistics is taught at school level drives me round the bend! OK, let’s try and dissect it.

The null hypothesis behind this question is that population scores are normally distributed with mean 85.9 and unknown variance. You need to estimate the variance from a sample of size 100. The standard way of doing this is to take (1/99) * 15,321. This is what’s known as an “unbiased estimator” of the population variance. That is, if you took repeated samples of size 100 from the population and calculated this estimate over and over again, the mean of these estimates would converge to the true population variance. So, strictly speaking, your estimate of the population variance should “divide by (n – 1)”.

Then, as the population variance is unknown and estimated from a sample, the question should be answered with a t-test. Using a z-test assumes that the population variance is known.

However, when n is large, there are two effects that simplify the situation as far as calculation is concerned. First, the results of dividing the sum of squares by n and by (n-1) are very close. Second, as n gets larger and larger, the t-distribution gets closer and closer to the normal distribution.

The textbook quoted expresses the view that if n >= 30 then, practically speaking, you can simplify matters by “dividing by n” rather than (n-1) in estimating the population variance, and you can use a z-test rather than a t-test. This is good practical advice.

However, what do you do when the difference in approaches matters? I’m afraid this a weakness of the hypothesis testing approach to statistical analysis; if you have a hard decision boundary (“testing at the 5% level”) you are bound to get these situations where approximating one way will give you one answer and approximating another way will reverse the decision!

What to do? In real life, we tend to “soften” the decision that comes out of hypothesis testing, and use phrases like “clear evidence for” or “weak evidence for” an effect. What should OP do? You’ll have to be guided by your teacher; the book you have there is good practical advice, and the answer scheme quoted accepts one approximation (use a z-test rather than a t-test) but rejects the other!
3
#10
(Original post by Gregorius)
Ha! This train-wreck of a question illustrates why the way statistics is taught at school level drives me round the bend! OK, let’s try and dissect it.

The null hypothesis behind this question is that population scores are normally distributed with mean 85.9 and unknown variance. You need to estimate the variance from a sample of size 100. The standard way of doing this is to take (1/99) * 15,321. This is what’s known as an “unbiased estimator” of the population variance. That is, if you took repeated samples of size 100 from the population and calculated this estimate over and over again, the mean of these estimates would converge to the true population variance. So, strictly speaking, your estimate of the population variance should “divide by (n – 1)”.

Then, as the population variance is unknown and estimated from a sample, the question should be answered with a t-test. Using a z-test assumes that the population variance is known.

However, when n is large, there are two effects that simplify the situation as far as calculation is concerned. First, the results of dividing the sum of squares by n and by (n-1) are very close. Second, as n gets larger and larger, the t-distribution gets closer and closer to the normal distribution.

The textbook quoted expresses the view that if n >= 30 then, practically speaking, you can simplify matters by “dividing by n” rather than (n-1) in estimating the population variance, and you can use a z-test rather than a t-test. This is good practical advice.

However, what do you do when the difference in approaches matters? I’m afraid this a weakness of the hypothesis testing approach to statistical analysis; if you have a hard decision boundary (“testing at the 5% level”) you are bound to get these situations where approximating one way will give you one answer and approximating another way will reverse the decision!

What to do? In real life, we tend to “soften” the decision that comes out of hypothesis testing, and use phrases like “clear evidence for” or “weak evidence for” an effect. What should OP do? You’ll have to be guided by your teacher; the book you have there is good practical advice, and the answer scheme quoted accepts one approximation (use a z-test rather than a t-test) but rejects the other!
thanks for the thorough explanation, i really appreciate it! unfortunately i don't have a teacher so i'm not quite sure what to do. to be fair, this exam question was from one of the early papers on this spec so i assume they figure out their preferred way and i'll find out what they do in the newer papers once i get around to them, and i'll just follow what they do there. i understand what's going on now though so thanks
0
X

new posts
Latest
My Feed

### Oops, nobody has postedin the last few hours.

Why not re-start the conversation?

see more

### See more of what you like onThe Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

### University open days

• Bournemouth University
Wed, 31 Jul '19
• Staffordshire University
Wed, 7 Aug '19
• University of Derby
Foundation Open Event Further education
Wed, 7 Aug '19

### Poll

Join the discussion

#### Are you tempted to change your firm university choice on A-level results day?

Yes, I'll try and go to a uni higher up the league tables (147)
17.52%
Yes, there is a uni that I prefer and I'll fit in better (74)
8.82%
No I am happy with my course choice (497)
59.24%
I'm using Clearing when I have my exam results (121)
14.42%