Download Tutorial 11 - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
26134 Business Statistics
[email protected]
Tutorial 11: Hypothesis Testing
Introduction: Key concepts in this tutorial are listed below
1. Difference between one tailed and two tailed test
2. Steps in Hypothesis testing
3. Choosing between z test and t-test.
Please fill in the student feedback survey at [email protected] at the start of the
class. I will leave the room and give you 5 mins to fill it out.
1
Final Exam Content
• Threshold 5
Normal Distribution
Sampling and Sampling Distribution
• Threshold 6
Confidence Interval
Hypothesis Testing
2
In statistics we usually want to statistically analyse a population but collecting data
for the whole population is usually impractical, expensive and unavailable. That is
why we collect samples from the population (sampling) and make inferences about
the population parameters using the statistics of the sample (inferencing) with some
level of accuracy (confidence level).
Statistical inference is the process of
drawing conclusions about the entire
population based on information in a
sample by:
• constructing confidence intervals on
population parameters
• or by setting up a hypothesis test on a
population parameter
Sample Size N
n
A population is a collection of all possible individuals, objects, or measurements of
interest. A sample is a subset of the population of interest.
General: Hypothesis Testing
•
•
1.
2.
3.
4.
5.
We use hypothesis testing to infer conclusions about the population parameters based on
analysing the statistics of the sample. Because in reality, we usually only have information about
the sample.
In statistics, a hypothesis is a statement about a population parameter.
Formulate the hypothesis:
The null hypothesis, denoted H0 is a statement or claim about a population parameter that is
initially assumed to be true. Is always an equality and the opposite of the competing claim in
the alternative hypothesis. The null hypothesis must specify that the population parameter is
equal to a single value (definition from textbook page 459).
The alternative hypothesis, denoted by Ha is the competing claim. What we are trying to
prove. Claim we seek evidence for. (Eg. Ha: population parameter ≠ or < or > hypothesised null
parameter)
Determine the level of significance α: related to the level of accuracy you want to be.
Determine the Test Statistic: a measure of compatibility between the statement in the null
hypothesis and the data obtained.
Determine the Critical Value: the critical value helps you identify the rejection and nonrejection region.
Make a decision rule and draw a conclusion: Compare the value of the test statistic with the
critical value and make your decision on whether you reject or do not reject the H0. If the test
statistic falls in the rejection region we reject H0 and conclude that we have enough evidence
to prove the alternative hypothesis is true at the α% level of significance. If the test statistic
fall in non-rejection region, we do not reject H0 and conclude that we do not have enough
evidence to prove the alternative hypothesis is true at the α % level of significance. Make your
conclusion in context of the problem.
STEP 1: Hypothesis Testingformulate the null and alternative
hypothesis
• H0: population parameter = null parameter
Ha: population parameter ≠ null parameter (2-tailed)
or
• H0: population parameter ≥ null parameter
Ha: population parameter < null parameter (1-tailed)
(left-tailed) or
• H0: population parameter ≤ null parameter
Ha: population parameter > null parameter (1-tailed)
(right-tailed)
5
STEP 2: Hypothesis Testingdetermine alpha α (level of
significance)
Confidence Level
6
STEP 3: Hypothesis Testing:
Determine the standardized test
statistic
The test statistic is a measure of compatibility between the statement in the
null hypothesis and the data obtained.
If population standard deviation (sigma) is given then we find the z-test statistic.
If population standard deviation is not given, we use the sample standard
deviation and find the t-test statistic where degrees of freedom is n-1.
7
STEP 4:Hypothesis Testing:
Determine the critical value
• The critical value helps you identify the
rejection and non-rejection region.
Confidence Level
8
• Note: If we use the t-test statistic, we can find
the critical value on the t distribution table.
STEP 5: Hypothesis Testing- write
the decision rule and draw a
conclusion
For a z-test statistic:
• For a left tail test (HA: μ < μ0), decision rule is: Reject H0 if ztest<-zα
• For a right tail test (HA: μ > μ0), decision rule is: Reject H0 if ztest>zα
• For a two tailed test (HA: μ ≠ μ0), decision rule is: Reject H0 if |ztest|>zα/2
For a t-test statistic:
9
• For a left tail test (HA: μ < μ0), decision rule is: Reject H0 if ttest<-tα,df=n-1
• For a right tail test (HA: μ > μ0), decision rule is: Reject H0 if ttest>tα,df=n-1
• For a two tailed test (HA: μ ≠ μ0), decision rule is: Reject H0 if
|ttest|>|tα/2,df=n-1|
CONCLUSION: If the test statistic falls in the rejection region, we reject H0 and say
that at 5% level of significance, there is sufficient evidence to conclude….
If the test statistic fall in non-rejection region, we do not reject H0 and say that at
5% level of significance, there is not enough evidence to conclude….
Make your conclusion in context of the problem.
μ=average life of a LED lamp
1) H0: μ ≤ 3000
Ha: μ>3000 (right-tailed test)
2)
3) Because the sample size is 20, which is lower than 30, we need to assume
the distribution is approaching to a normal distribution.
4) Critical Value Zα=1.645
10
μ=average life of a battery
1) H0: μ ≤ 4000
Ha: μ>4000 (right-tailed test)
Because the sample size equals to 12, which is less than 30, we need to assume the
distribution approaches normal.
tcritical=tα=0.05,df=n-1=11 = 1.7959
11
Because the t-stat = 1.2668 < 1.7959 = t-crit, we do not
reject H0. Therefore we conclude that we do not have
enough evidence to prove that the average life of the
battery exceeds 4000 hours at the 5% level of significance.
12
μ=average life of a battery
H0: μ ≤ 4000
Ha: μ>4000 (right-tailed test)
Because the sample size equals to 500, which is more than 30,
CLT (Central Limit Theorem) applies.
Because the t-stat = 8.177 < 1.645 =
t-crit, we can reject H0. Therefore we
conclude that we have enough
evidence to prove that the average
life of the battery exceeds 4000
hours at the 5% level of significance.
13
Confidence Level
14
The question is asking to choose a significance
level that will have the lowest likelihood of
making Type I error. Type I error is the
likelihood of falsely rejecting the null
hypothesis. Recall the null hypothesis from
Activity 2 was, H0: μ ≤ 4000.
The smaller the significance level, the lower
the likelihood of making a type I error. So at
the significance level of 0.1, there is a 10%
chance of making a type I error, whereas at
the significance level of 0.01, there is only 1%
chance of making a type I error. So we select
our significance level to be the lowest option
given, which is 0.01.
REVISION
15
THRESHOLD 5: Normal Distribution
• A random variable X is defined as a unique numerical value associated with
every outcome of an experiment.
• If X follows a normal distribution, then it is denoted as X~N(μ,σ)
• To find probabilities under the normal distribution, random variable X must be
converted to random variable Z that follows a standard normal distribution
denoted as Z~N(μ=0,σ=1). We need to do this to standardise the distribution
so we can find the probabilities using the tables.
• To convert random variable X to random variable Z, we calculate the
z-score =(x- μ)/ σ
• Sampling distribution of the sample mean, X also follows a normal distribution
by the CLT and it is denoted as X~N(μx=μ, σx)
• To convert random variable X to random variable Z, we calculate the
z-score =(x- μ)/ σx
• If n/N>0.05, finite correction factor needs to be applied for the formula of the
standard error, therefore
16
Calculating Probabilities using normal
distribution applying the complement rule
and/or symmetry rule and/or interval rule
• Complement Rule P(Z>z)=1-P(Z<z)
• Symmetry Rule P(Z<-z)=P(Z>z)
• Interval Rule P(-z<Z<z)=P(Z<z)-P(Z<-z)
17
THRESHOLD 5: Sampling
Distribution of the Sample Mean
• Under the Central Limit Theorem (CLT), we can conclude that the
sampling distribution of the sample mean is approximately normally
distributed where:
• The original (population) distribution, from which the sample was
selected, is normally distributed (regardless of sample size);
OR
• If a sufficiently large sample size is taken, that is the sample size is
greater than or equal to 30. (regardless of the population
distribution).
18
• Note that only ONE of these conditions need to be satisfied for this
conclusion to be reached.
Finding Probabilities of the Mean
19
THRESHOLD 6: Confidence Intervals
Mean
20
Mean
THRESHOLD 6: Hypothesis Testing
1. Formulate the hypothesis:
H0: population parameter = null parameter
Ha: population parameter ≠ null parameter (2-tailed)
or
H0: population parameter ≥ null parameter
Ha: population parameter < null parameter (1-tailed/left tailed)
or
H0: population parameter ≤ null parameter
Ha: population parameter > null parameter (1-tailed/right tailed)
2. Determine the level of significance α: Assumptions are if sample size is less than 30, we need to
assume the distribution approaches normal. If sample size is more than 30, we need to assume the distribution
approaches normal.
3. Determine the Test Statistic:
or
4. Determine the Critical Value: Compare test statistic with critical value.
It is really helpful to draw the distribution up and shade the rejection region.
5. Make a decision rule and draw a conclusion in context of the problem.
21