Download Chi-square Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Chapter Thirteen Part I
Hypothesis Testing:
Basic Concepts and Tests of
Association,
Chi-Square Tests
Basic concepts - Example
• GEICO feels that consumers are bored with the
gecko ad campaign (mean liking = 2; (1 (strongly
dislike) – 5 (strongly like) scale).
• GEICO wants to verify this feeling so they take a
sample and measure liking levels. The mean in
the sample is 4
• Should GEICO conclude that their feeling is
wrong or that the sample mean is a function of
chance?
Hypothesis Testing: Basic Concepts
• Hypothesis: An assumption made about a
population parameter (not sample statistic)
– E.g. mean attitudes are 2 measured on a 1 – 5 scale
• Purpose of Hypothesis Testing: To make a
judgment about the difference between the
sample statistic and the population parameter
• The mechanism adopted to make this objective
judgment is the core of hypothesis testing
Hypothesis testing: Logic
• Is the sample statistic a function of chance or
luck rather than an accurate representation of
the population parameter?
• Example:
– Hypothesized mean attitudes are 2 (on a 1 – 5 scale)
– Observed mean sample attitudes are 4 (on a 1 – 5
scale)
– Is the difference between the two a chance event or
are we really wrong about our hypothesis?
– This is statistically evaluated.
Problem Definition
Clearly state the null and
alternative hypotheses.
Choose the relevant test
and the appropriate
probability distribution
Determine the
significance level
Compute
relevant test
statistic
Choose the critical value
Determine the
degrees of
freedom
Compare test statistic
and critical value
Decide if one-or
two-tailed test
Does the test statistic fall
in the critical region?
Yes
Reject null
No
Do not reject null
1. Formulate Null & Alternative hypotheses
• Null hypothesis (Ho) –
– the hypothesis of no difference
• between the population parameter and sample statistic
– OR no relationship
• Between two sample statistics
– A mirror-image of the alternative (research) hypothesis
• Alternative hypothesis (Ha or H1) – the hypothesis of
differences or relationships
• Example
– Ho: Mean population attitudes = 2
– Ha: Mean population attitudes are not = 2
2. Choose appropriate test and probability
distribution
• Depends on whether we are
– Comparing means (Z distribution if population
standard deviation is known; t distribution if population
standard deviation is not known)
– Comparing frequencies (chi-square distribution)
3. Determine significance level
• The level at which we want to make a judgment
about the population parameter (the null
hypothesis)
• Generally 10%, 5%, 1% (corresponding to 90%,
95% and 99% confidence levels) in social
sciences
• The level at which the critical test statistic is
identified
4. Determine degrees of freedom
• Number of bits of unconstrained data available to
calculate a sample statistic
• E.g. for X bar, d.f. is = n; for s, d.f. is n-1, since 1
d.f. is lost due to the restriction that we need to
calculate the mean first to calculate the standard
deviation
5. Decide if it is a one / two tailed test
• One Tailed test: If the Research Hypothesis is
expressed directionally:
– E.g. Head-On wants to test if consumers dislike their
ad campaign (mean liking < 3; (1 (strongly dislike) – 5
(strongly like) scale).
– Ho: Population mean attitudes are greater than or
equal to 3.0
– Ha: Population mean attitudes are less than 3.0
• For confirmation of H1 look in the tail of the
direction of the Research Hypothesis
5. Decide if it is a one / two tailed test
• Two Tailed test: If the Research Hypothesis is
expressed without direction
– E.g. Head-On wants to test if consumers feel
differently about their ad campaign than they felt a
year ago. (mean liking = 4.5; (1 (strongly dislike) – 5
(strongly like) scale).
– Ho: Population mean attitudes = 4.5
– Ha: Population mean attitudes are not equal to 4.5
• For confirmation of H1 look in the tails on both
sides of the distribution
6. Find the critical test statistic
• Critical z value requires knowledge of level of
significance
• Critical t value requires knowledge of level of
significance and degrees of freedom
• Critical chi-square requires knowledge of level of
significance and degrees of freedom
7. Criteria for rejecting / not rejecting H0
• Compute observed test statistic
• Compare critical test statistic with observed test
statistic
– If the absolute value of observed test statistic is
greater than the critical test statistic, reject Ho
– If the absolute value of observed test statistic is
smaller than the critical test statistic then Ho cannot
be rejected.
• Regions of rejection / acceptance
Type 1 and Type 2 errors
Data Analysis
conclusion is:
Reject Null
hypothesis
Do not reject Null
hypothesis
Null hypothesis in population is
True
False
Type 1 error
Prob: alpha
(Significance
Correct
decision
(Power of the
level)
test)
Correct
decision
(Confidence
Type 2 error
Prob: beta
level)
(weakness of
the test)
Type 1 and Type 2 errors
• The lower the confidence level, the greater the
risk of rejecting a true H0 – Type 1 error (alpha)
– i.e. if you reduce the confidence level from 95% to
90% the chances of you declaring that the effect
observed in the sample actually prevails in the
population, are higher.
– If the effect in reality does not exist in the population,
then you increase the risk of committing a Type 1
error.
• Therefore in Type 1 error you declare an effect
which does not exist
Type 1 and Type 2 errors
• The higher the confidence level the greater the
risk of accepting a false H0 – Type 2 error (beta)
– i.e. if you increase the confidence level from 95% to
99%, the chances that you miss the effect which may
actually be there in the population, are higher.
– the power of the test to spot the effect is reduced
– Therefore power = 1 – beta
• Therefore in Type 2 error you miss an effect
which exists
Hypothesis Testing
Tests in this class
• Frequency Distributions
Statistical Test
2
• Means
z (if  is known)
• Means
• Means
(one)
(two)
(more than two)
t (if  is unknown)
t
ANOVA
Chi-Square as a test of independence
• Statistical Independence: if knowledge of one
does not influence the outcome of the other
• E.g. Affiliation to school (nominally scaled) does
not influence decision to eat at the student union
• Expected Value: The average value in a cell if the
sampling procedure is repeated many times
• Observed Value: The value in the cell in one
sampling procedure
• Only nominal / categorical variables
Chi-square Step-by-Step
1) Formulate Hypotheses
Chi-Square As a Test of Independence
Null Hypothesis Ho
• Two (nominally scaled) variables are statistically
independent
• There is no relationship between school affiliation
and decision to eat at the student union
Alternative Hypothesis Ha
• The two variables are not independent
• School affiliation does influence the decision to
eat at the student union
Chi-square As a Test of Independence
(Contd.)
Chi-square Distribution
• A probability distribution for categorical data
• Total area under the curve is 1.0
• A different chi-square distribution is associated
with different degrees of freedom
The chi-square distribution
F(x2)
df = 4
 = .05
x2
Chi-square Step-by-Step
1)
2)
3)
4)
5)
Formulate Hypotheses
Calculate row and column totals
Calculate row and column proportions
Calculate expected frequencies (Ei)
Calculate 2 statistic
Chi-square Statistic (2)
• Measures of the difference between the actual numbers
observed in cell i (Oi), and number expected (Ei) under
independence if the null hypothesis were true
(Oi  Ei )
 
i 1
Ei
2
n
2
With (r-1)*(c-1) degrees of freedom
r = number of rows c = number of columns
• Expected frequency in each cell: Ei = pc * pr * n
Where pc and pr are proportions for independent variables and n
is the total number of observations
Chi-square Step-by-Step
1)
2)
3)
4)
5)
6)
Formulate Hypotheses
Calculate row and column totals
Calculate row and column proportions
Calculate expected frequencies (Ei)
Calculate 2 statistic
Calculate degrees of freedom
Chi-square As a Test of Independence
(Contd.)
Degree of Freedom
v = (r - 1) * (c - 1)
r = number of rows in contingency table
c = number of columns
Chi-square Step-by-Step
1)
2)
3)
4)
5)
6)
7)
Formulate Hypotheses
Calculate row and column totals
Calculate row and column proportions
Calculate expected frequencies (Ei)
Calculate 2 statistic
Calculate degrees of freedom
Obtain Critical Value from table
The chi-square distribution
F(x2)
Critical value = 9.49
df = 4
5% of area under curve
 = .05
x2
• Ex: Significance level = .05
Degrees of freedom = 4
CVx2 = 9.49
Chi-square Step-by-Step
1)
2)
3)
4)
5)
6)
7)
8)
Formulate Hypotheses
Calculate row and column totals
Calculate row and column proportions
Calculate expected frequencies (Ei)
Calculate 2 statistic
Calculate degrees of freedom
Obtain Critical Value from table
Make decision regarding the Null-hypothesis
Example of Chi-square as a Test of
Independence
Eat / Don’t eat
School
Y
N
A
10
8
B
20
16
C
45
18
D
16
6
E
9
2
This is the
observed
value
This is a ‘Cell’
Chi-square example
School
A
B
C
D
F
Total
Pc
Eat at SU
O1 = 10
E1 = 12
O3 = 20
E3 = 24
O5 = 45
E5 = 42
O7 = 16
E7 = 15
O9 = 9
E9 = 7
100
0.67
0.24 * 0.67 *
150
Don’t Eat
O2 = 8
E2 = 6
O4 = 16
E4 = 12
O6 = 18
E6 = 21
O8 = 6
E8 = 7
O10 = 2
E10 = 4
50
0.33
36/150
Total
18
Pr
0.12
36
0.24
63
0.42
22
0.15
11
0.07
150
1.00
1.00
Chi-square example
• Observed chi-square = [(10 – 12)2 / 12] + [(8 – 6)2 / 6]
+ [(20 – 24)2 / 24] + …+ [(2 – 4)2 / 4] = 5.42
• d.f. = (r-1)(c-1) = (5-1)(2-1) = 4
• Critical chi-square at 5% level of significance at 4 degrees
of freedom = 9.49
• Since observed chi-square < critical chi-square (5.42 <
9.49), H0 cannot be rejected
• Hence decision to eat / not eat at the student union is
statistically independent of their school affiliation. In
other words there is no relationship between the decision
to eat at the SU and the school they are in.
The chi-square distribution
F(x2)
Critical value = 9.49
df = 4
5% of area under curve
 = .05
x2
Ex: Significance level = .05
Degrees of freedom = 4
CVx2 = 9.49
The decision rule when testing hypotheses by means of
chi-square distribution is:
If x2 is <= CVx2, accept H0 Thus, for 4 df and  = .05
If x2 is > CVx2, reject H0
If If x2 is <= 9.49, accept H0