Download 2 Goodness of Fit Test

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
χ2 "chi-square" probability distribution
0
•
•
•
•
•
•
•
•
µ
χ2 "chi-square" probability distribution
χ2
continuous probability distribution
shape is skewed to the right
variable values on horizontal axis are ≥ 0
area under the curve represents probability
extends out to infinity along positive horizontal axis
shape depends on "degrees of freedom" (d.f.)
 when d.f. = 2 only, curve is an exponential distribution
 for higher degrees of freedom, the peak shifts to the right
 when d.f. > 90, curve begins to look like normal distribution
(even then χ2 is still always somewhat skewed right)
mean is located a little to the right of the peak
Mean and Standard Deviation are determined by the degrees of freedom
mean = d.f.
standard deviation = 2 (d.f .)
TI –83, 84: χ2 cdf (lower bound, upper bound, degrees of freedom)
We will be using the right tail for hypothesis tests for Goodness of Fit or for
Independence
TI –83, 84: χ2 cdf (lower bound, 10^99, degrees of freedom)
χ2 "chi-square" probability distribution
* * * * * * * * * ** * * * * * * * * ** * * * * * * * * ** * * * * * * * * ** * * * * * * *
χ2 "chi-square" probability distributionis used for
χ2 with 6 degrees of freedom
χ2 with 10 degrees of freedom
PRACTICE: Sketch the shape yourself. Make sure it is skewed to the right.
Finding probabilities for the χ2 "chi-square" probability distribution
Page 1
Test of Goodness of Fit:
Hypothesis: Population fits an assumed distribution (theory)
Sample data is collected from a population
Hypothesis test is performed to see if the sample data supports the
theory that the population fits this assumed distribution, or not.
Test of Independence:
Hypothesis: Two qualitative variables are independent of each other
Sample data is collected from a population
Hypothesis test is performed to see if the sample data supports
the theory that these two variables are independent or not
Hypothesis Test (and confidence intervals) for an unknown
population standard deviation.
How we use the Chi-square distribution to make a decision inn
the Test of Independence and the Test of Goodness of Fit:
χ2 Goodness of Fit Test
Writing the Hypothesis Test for a Goodness Of Fit Test
We compare our observed data to what the data would be expected
to look like if the null hypothesis is true.
Null hypothesis states a probability distribution that we think the population follows
Test Statistic:
Measure of the differences between Observed Data and Expected Data
•
If the differences are large, the Observed and Expected Data are different,
so the observed data contradicts the null hypothesis
If the differences are small, the Observed and Expected Data are similar
so the observed data does NOT contradict the null hypothesis
The test statistic measures differences between observed and expected data
The test statistic in this test follows a χ2 "chi-square" probability distribution.
CASE 1:
Observed
and
Expected
are very
different
Differences
between
observed and
expected
are large
Test
Statistic
is large
Test Statistic is far
out in the right tail
of the χ2 probability
distribution
We look at sample data to decide if the population follows that distribution or not.
Null Hypothesis: Ho: The data fit the expected (hypothesized) distribution
(Describe the expected hypothesized distribution either in a
sentence, by using its appropriate name, by showing a formula or
referring to a nearby visible table or list)
Alternate Hypothesis:
Ha: The data do not fit the expected (hypothesized) distribution
•
Decide upon α
•
Collect observed data. Calculate expected data.
•
Find the test statistic and p-value and draw the graph
Test statistic =
Area to
the right
of the test
statistic is
small
Differences
between
observed and
expected
are small
Test
Statistic
is small
Test Statistic is NOT
far out in the right tail
of the χ2 probability
distribution
Area to
the right
of the test
statistic is
large
∑
all cells
(Obs − E) 2
E
in table
•
•
CASE 2:
Observed
and
Expected
are similar
Write the hypotheses
Find the p-value
χ2 distribution , right tailed test
df = # of CELLS – 1
χ2cdf ( test statistic, 10^99, df)
Draw the graph ; shade the p-value; label axis, test statistic & p-value
Compare p-value to the significance level α and make a decision.
Write a conclusion in the context of the problem.
Calculator Instructions for test statistic
STAT EDIT:
Enter Observed Data frequencies in list L1
Enter Expected Theoretical frequencies in list L2
(Do NOT put totals in either list)
Arrow up to the very top (title line) in list L3 and input the
instruction (L1 −L2)2 / L2
STAT CALC:
1 variable stats L3.
ΣX is the total of all items in list L3
ΣX = ∑
(Obs. − E ) 2
E
is the test statistic
The test statistic is a number measuring the size of the differences between the
distributions
Page 2
χ2 Goodness of Fit Test
Hypothesis Test for Goodness of Fit Example 1
Draw, shade and label the graph
Facebook and Twitter data are based on data in the Report on Social Network Demographics in 2012 based on data from
Google's Doubleclick Ad Planner http://royal.pingdom.com/2012/08/21/report-social-network-demographics-in-2012/)
The table below shows the age distribution of Facebook users
Age Distribution of Facebook users in 2012 :
Example: 9% of all Facebook users are age 18-24
Age Range
0-17
18-24
25-34 35-44 45-54
Percent
7%
9%
19%
20%
33%
55-64
9%
65+
3%
Suppose a random sample of 175 Twitter users has the age distribution below:
Age Distributon for a sample of 175 Twitter Users:
Example: 25 people in the sample of 175 Twitter users are age 18-24
Age Range
0-17
18-24
25-34 35-44 45-54 55-64
65+
Number
14
25
39
41
40
11
5
We want to know if the age distribution of Twitter users fits the same age
distribution as Facebook Users.
Degrees of freedom = number of CELLS – 1 = ____ – 1 = _____
Probability Distribution to use to calculate the p-value: _____ with df = _____
p-value = P(χ2 > ____) = (χ2cdf (________, ________, ______) =_______
TI –83, 84:χ2 cdf (lower bound, 10^99, degrees of freedom)
Decision: ______________________________________________
Reason for Decision:
Null Hypothesis Ho: _____________________________________________
_________________________________________________________________
Alternate Hypothesis HA: ________________________________________
_________________________________________________________________
α = __________
What would the sample data look like if it exactly followed the expected
distribution (null hypothesis)?
L1
Observed
Data
L2
Expected
Data
L3
(Observed – Expected)2
Expected
0-17
18-24
25-34
35-44
At a 5% level of significance:
The sample data provide sufficient evidence to conclude that age
distribution of Twitter users is different from that of Facebook users.
The age distribution of Twitter users does not fit the same age
distribution as Facebook users.
Calculator Instructions for finding the test statistic:
STAT EDIT:
Enter Observed Data frequencies in list L1
Enter Expected Theoretical frequencies in list L2
(Do NOT put totals in either list)
Arrow up to the very top (title line) in list L3 and input
(L1 −L2)2 / L2 ; then press Enter
STAT CALC:
45-54
55-64
1 variable stats L3.
ΣX is the total of all items in list L3
ΣX =
65+
Total =
Page 3
Conclusion:
Total =
Test Statistic = Sum = ______
(Obs. − E ) 2
∑ E
is the test statistic
The test statistic is a number that measures the size of the differences
between the distributions
χ2 Goodness of Fit Test
Draw, shade and label the graph
Hypothesis Test for Goodness of Fit Example 2
At Cathy's Campus Coffee Cart, Cathy has a theory about what her
customers order. She thinks that the distribution of her customers orders is
Hypothesized Distribution
Order
large
small
caffeinated caffeinated
coffee
coffee
Percent 30%
20%
decaf
coffee
tea
iced
drinks
juice
10%
10%
20%
10%
Cathy wants to determine whether this is really the true distribution of
orders for all her customers. Cathy tests her theory by selecting a sample of
200 customers and finds that the orders are:
Observed Data from sample
Order
Order
large
caffeinated
coffee
Number 70
33
small
caffeinated
coffee
35
Degrees of freedom = number of CELLS – 1 = ____ – 1 = _____
Probability Distribution to use to calculate the p-value: _____ with df = _____
p-value = P(χ2 > ____) = (χ2cdf (________, ________, ______) =_______
TI –83, 84:χ2 cdf (lower bound, 10^99, degrees of freedom)
decaf
coffee
tea
iced
drinks
Decision: ______________________________________________
10
37
15
Reason for Decision:
Null Hypothesis Ho: ______________________________________________
_______________________________________________________________________
_______________________________________________________________________
Alternate Hypothesis HA: ________________________________________
_______________________________________________________________________
_______________________________________________________________________
α = __________
What would the sample data look like if it exactly followed the expected
distribution (null hypothesis)?
L1
Observed
Data
L2
Expected
Data
L3
(Observed – Expected)2
Expected
large coffee
small
ff
decaf
tea
iced drinks
juice
Total =
Page 4
Conclusion:
At a 5% level of significance, the sample data provide evidence
to conclude that the true distribution of purchases of all
Cathy's customers does NOT fit the expected distribution of
30% large caffeinated coffee; 20% small caffeinated coffee;
10% decaf coffee; 10% tea; 20% iced drinks; 10% juice
Calculator Instructions for finding the test statistic:
STAT EDIT:
Enter Observed Data frequencies in list L1
Enter Expected Theoretical frequencies in list L2
(Do NOT put totals in either list)
Arrow up to the very top (title line) in list L3 and input
(L1 −L2)2 / L2 ; then press Enter
STAT CALC:
1 variable stats L3.
ΣX is the total of all items in list L3
ΣX =
(Obs. − E ) 2
∑ E
is the test statistic
The test statistic is a number that measures the size of the differences
between the distributions
Total =
Test Statistic = Sum = ______
χ2 Goodness of Fit Test
Draw, shade and label the graph
Hypothesis Test for Goodness of Fit Example 3
Is the random number generator in the TI-84 calculator uniformly distributed when
simulating rolling a 6-sided die?
I used my TI-84 calculator to generate a list of 500 random integers between 1 and 6; this
simulates rolling a 6 sided die 500 times. (That is the largest sample my calculator memory
would hold on the day I did this). If the random number generator is not biased, we would
expect the outcomes to be uniformly distributed over the integers 1,2,3,4,5,6 with some
small amount of variation due to randomness.
In the chapter for Central Limit Theorem we had each student in the class use their
calculators to roll a die 100 times. Based on my feelings (not knowledge) about the
numbers that students get, I have suspected that the random number generator in the TI
calculator might be biased and might not be uniformly distributed.
Degrees of freedom = number of CELLS – 1 = ____ – 1 = _____
Probability Distribution to use to calculate the p-value: _____ with df = _____
p-value = P(χ2 > ____) = (χ2cdf (________, ________, ______) =_______
TI –83, 84:χ2 cdf (lower bound, 10^99, degrees of freedom)
Here is the sample my calculator to generated to simulate rolls of the die:
Number Showing on Die
Number of Rolls
1
2
3
4
5
6
78
84
89
81
100
68
Decision: ______________________________________________
Reason for Decision:
Based on the sample data, are random numbers simulating dice rolls on the TI-84 calculator
uniformly distributed?
Null Hypothesis Ho: ______________________________________________
Conclusion:
_________________________________________________________________
At a 5% level of significance, ______________________________
Alternate Hypothesis HA: ________________________________________
_____________________________________________________
_________________________________________________________________
_____________________________________________________
α = __________
_____________________________________________________
What would the sample data look like if it exactly followed the expected
distribution (null hypothesis)?
_____________________________________________________
Outcome of
Rolling Die
L1
Observed
Data
L2
Expected
Data
L3
(Observed – Expected)2
Expected
1
2
3
4
5
6
STAT CALC:
1 variable stats L3.
ΣX is the total of all items in list L3
(Obs. − E ) 2
ΣX = ∑
E
Total =
Page 5
Calculator Instructions for finding the test statistic:
STAT EDIT:
Enter Observed Data frequencies in list L1
Enter Expected Theoretical frequencies in list L2
(Do NOT put totals in either list)
Arrow up to the very top (title line) in list L3 and input
(L1 −L2)2 / L2 ; then press Enter
Total =
Test Statistic = Sum = ______
is the test statistic
The test statistic is a number that measures the size of the differences
between the distributions
χ2 Goodness of Fit Test
Draw, shade and label the graph
Hypothesis Test for Goodness of Fit Example 4
Are births uniformly distributed by day of the week?
The Health Commissioner in River County wants to know whether births in
the county are uniformly distributed by day of week.
A sample of 434 randomly selected births at hospitals in the county yield the
following data.
Degrees of freedom = number of CELLS – 1 = ____ – 1 = _____
Day of the Week
Number of
Births in sample
Sun.
41
Mon.
64
Tues. Wed.
71
72
Thurs. Fri.
71
69
Sat.
46
Based on information in the National Vital Statistics Report January 7, 2009
http://www.cdc.gov/nchs/data/nvsr/nvsr57/nvsr57_07.pdf
Probability Distribution to use to calculate the p-value: _____ with df = _____
p-value = P(χ2 > ____) = (χ2cdf (________, ________, ______) =_______
TI –83, 84:χ2 cdf (lower bound, 10^99, degrees of freedom)
Null Hypothesis Ho: ______________________________________________
Decision: ______________________________________________
_________________________________________________________________
Reason for Decision:
Alternate Hypothesis HA: ________________________________________
_________________________________________________________________
α = __________
At a 5% level of significance, ______________________________
What would the sample data look like if it exactly followed the expected
distribution (null hypothesis)?
Day of Week
L1
Observed
Data
Conclusion:
L2
Expected
Data
L3
(Observed – Expected)2
Expected
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
Calculator Instructions for finding the test statistic:
STAT EDIT:
Enter Observed Data frequencies in list L1
Enter Expected Theoretical frequencies in list L2
(Do NOT put totals in either list)
Arrow up to the very top (title line) in list L3 and input
(L1 −L2)2 / L2 ; then press Enter
STAT CALC:
Total =
Total =
Test Statistic = Sum = ______
1 variable stats L3.
ΣX is the total of all items in list L3
ΣX =
(Obs. − E ) 2
∑ E
is the test statistic
The test statistic is a number that measures the size of the differences
between the distributions
Page 6
χ2 Hypothesis Test for Independence
χ2 Hypothesis Test for Independence
Re-Exploring Independence in a Contingency Table:
EXAMPLE 5: Lina is a statistics student doing a project to compare ice
cream flavor preferences at 3 ice cream stores in different cities. She
wants to determine if customer preferences are related to store location or
if they are independent.. She will select a sample of customers, and
categorize each customer by store location and flavor preference.
Assume that there are only 3 flavors
A.
Fremont(F) Gilroy(G) Hayward (H)
Chocolate (C)
25
25
50
Vanilla (V)
15
15
30
Mint Chip (M)
10
10
20
Total
50
50
100
Are flavor preference and store location INDEPENDENT?
Total
100
60
40
200
We use a hypothesis Test for Independence to help us decide
whether two qualitative variables are independent or not.
Null hypothesis assumes that the variables ARE independent.
Alternate hypothesis challenges us to show that the variables are NOT
independent.
We calculate what the data is expected to look like if the variables are
independent as in Ho.
We compare the observed data to the expected data
Is the "observed data" sufficiently different from "expected" to
decide that Ho is not true.
OR is the "observed data" reasonably close to "expected"
Writing the Hypothesis Test for Independence
•
Write the hypotheses
B:
Fremont (F) Gilroy(G) Hayward (H)
Chocolate (C)
20
10
70
Vanilla (V)
20
10
20
Mint Chip (M)
10
30
10
Total
50
50
100
Are flavor preference and store location INDEPENDENT?
Total
100
50
50
200
Null Hypothesis: The variables (DESCRIBE THEM) are independent
Alternate Hypothesis: The variables (DESCRIBE THEM) are NOT independent
•
Decide upon significance level α
•
Collect observed data. Calculate expected data.
•
Find the test statistic and p-value:
(Obs − E) 2
Test statistic ∑
E
all cells
C:
Fremont (F) Gilroy(G) Hayward (H)
Chocolate (C)
27
24
49
Vanilla (V)
13
15
32
Mint Chip (M)
10
11
19
Total
50
50
100
Are flavor preference and store location INDEPENDENT?
How far can our data vary from the independent data in part A and still be
considered "similar"? How far must our data vary from the independent data
in part A in order to be considered "different"?
Page 7
in table
Total
100
60
40
200
Find the p-value
χ2 distribution , right tailed test
df = (# of Rows – 1) (# of Columns -1)
χ2cdf ( test statistic, 10^99, df)
Draw the graph ; shade the p-value; label axis, test statistic & p-value
•
Compare p-value to the significance level α and make a decision.
•
Write a conclusion in the context of the problem.
χ2 Hypothesis Test for Independence
(in a 2-way contingency table)
Measure of the difference for each cell inside the table
EXAMPLE 6: Lina is a statistics student doing a project to compare ice
cream flavor preferences at 3 ice cream stores in different cities. She
wants to determine if customer preferences are related to store location or
if they are independent.. She will select a sample of customers, and
categorize each customer by store location and flavor preference.
Are flavor preference and store location independent?
Data Set D:Lina's actual data:
OBSERVED DATA Fremont (F)
Chocolate (C)
Vanilla (V)
Mint Chip (M)
Total
22
10
18
50
Gilroy( G)
30
12
8
50
Hayward (H)
48
38
14
100
Total
100
60
40
200
(observed data value − expected data value) 2
(Obs − E) 2
=
expected data value
E
Add them all up to get: Test Statistic =
(Obs − E) 2
∑
E
all cells
in table
Test Statistic = (22−25) /25 + (30−25) /25 + (48−50)2/50
2
2
+ (10−15)2/15 + (12−15)2/15 + (38−30)2/30
+ (18−10)2/10 + (8−10)2/10 + (14−20)2/20
Test Statistic = 14.44
Draw, shade and label the graph
Null Hypothesis:
Ho: __________________________________________________________
____________________________________________________________
Alternate Hypothesis:
Degrees of freedom = (number of rows – 1)(number of columns – 1)
Ha: __________________________________________________________
____________________________________________________________
Compare OBSERVED Data to how the data is EXPECTED to look IF
INDEPENDENT
Calculating the EXPECTED DATA (assumes independence)
EXPECTED DATA
Fremont (F) Gilroy( G) Hayward (H)
Chocolate (C)
df = (___ – 1)(____ – 1) = (___)(____) = ____
Probability Distribution to use to calculate p-value: _____ with df = _____
p-value = P(χ2 > _____) = (χ2cdf (_________, _________, _____) = ____
TI –83, 84:χ2 cdf (lower bound, 10^99, degrees of freedom)
Total
100
Decision:
Reason for Decision:
Vanilla (V)
60
Mint Chip (M)
40
Conclusion:
At a 5% level of significance, the data show sufficient evidence
to conclude that customer ice cream flavor preference and store
location are NOT independent.
200
Flavor preference depends on store location.
Total
50
50
100
How close is actual "observed data (Obs)" to "expected data (E)"?
Page 8
EXAMPLE 7:
χ2 Hypothesis Test for Independence
A professor believes that giving extra time on an exam won’t affect the
grades. He divided a random sample of 300 students into two groups and
gave the same test. The first group had one hour to complete the exam; the
second group had unlimited time. At a 5% level of significance, is there
sample evidence to show time on exam and grade are related?
OBSERVED DATA
1 hour
Unlimited time
TOTAL
Null Hypothesis:
Grade Earned on Exam
C
A
B
65
23
42
85
17
48
150
40
90
F
12
8
20
TOTAL
142
158
300
Ho: Grade and Time are Independent
Notes:
The test was done by putting the observed data into a
matrix and using the χ2 Test. The calculator calculated
the expected data and stored it in another matrix. The
calculator found the test statistic as a measure of the
differences between the observed and expected data.
The calculator used the Chi-Square distribution and the
test statistic to calculate the p-value using the χ2cdf
p-value = P(χ2 > 3.92) = χ2cdf (3.92, 10^99, 3) = 0.2697
TI –83, 84:χ2 cdf (lower bound, 10^99, degrees of freedom)
Alternate Hypothesis: Ha: Grade and Time are Dependent
Use the χ2 Test on your calculator to find the following:
Find the output matrix with the EXPECTED DATA and fill it in
Test Statistic = 3.92
Degrees of freedom = (number of rows – 1)(number of columns – 1)
Degrees of freedom = (2 – 1)(4 – 1) = (1)(3) = 3
Probability Distribution used to calculate p-value: χ2 with df = 3
p-value = 0.2597
Draw, shade and label the graph
Decision: Do NOT Reject Ho
Reason for Decision: Pvalue > α
0.2697 > 0.05
Conclusion: At a 5% level of significance the data do not
show evidence to show that time allowed for an exam and
grade earned on the exam are dependent.
We believe that time allowed for an exam and grade earned
on the exam are independent.
Page 9
OBSERVED DATA
1 hour
Unlimited time
TOTAL
Grade Earned on Exam
C
A
B
71
18.93
42.6
79
21.07
47.4
150
40
90
F
9.47
10.53
20
TOTAL
142
158
300
SEE NEXT PAGE FOR CALCULATOR INSTRUCTIONS
FOR THE χ2 Test AND FOR FINDING THE EXPECTED DATA
FORMULAS AND CALCULATOR INSTRUCTIONS
TEST OF INDEPENDENCE
Row Total 
Formula for Expected Data in Cell = 
 (Column Total )
Total
for Table 

Type of Test
Homogeneity
Variables
One variable, across
multiple populations
How close is actual "observed data (Obs)" to "expected data (E)"?
Independence
One population
categorized according to
two variables
Measure differences for each cell inside the table
(observed data value − expected data value) 2
(Obs − E) 2
=
expected data value
E
(Obs − E) 2
Add them all up to get: Test Statistic = ∑
E
all cells
in table
Calculator Instructions for Test of Independence:
Entering the data:
2nd MATRIX EDIT; select matrix [A] and press enter.
Enter matrix size: number of rows x number of columns, pressing enter after
each is entered
NOTE: Look only at data when figuring out size; do NOT include row
or column totals in the matrix. Matrix on calculator will change shape
on the screen to match the stated size.
Enter observed data into matrix A, pressing enter after each number is entered.
Performing the hypothesis test to find the p-value and test statistic
STAT TESTS
χ2 Test
Observed: 2nd Matrix 1: [A] Enter
(This is the input to the test)
Expected: 2nd Matrix 2: [B] Enter
(This is output the calculator creates when doing the test)
OUTPUT SCREEN will show p-value and test statistic and degrees of freedon
ON HOME SCREEN: To see matrices containing observed and expected data
Observed: 2nd Matrix 1: [A] Enter
Expected: 2nd Matrix 2: [B] Enter
You may need to scroll to the right to see part of the matrix if it is too wide
to fit on y our calculator screen
Page 10
What is Tested?
Is the variable similar across all
populations? Does the the variable
follows the same unknown
distribution for all populations.
Are the two variables independent
or not independent?
The mathematics and procedures of the Test of Homogeneity and the
Test of Independence are the same. The sampling method is what
distinguishes between which type of test it is.
Example:
Selecting a sample of male students and a sample of female students and
testing to see whether the age distributions for males and females are the
same or different is a test of _________________________________
Selecting a random sample of students, categorizing each student by age
and gender, and then testing to see whether the age distribution is related to
gender or not is a test of _________________________________
Example:
We select a random sample of hospital births. We categorize each birth by
the day of week on which it occurred and by whether it was a natural
(vaginal) birth or a surgical birth (cesarean section). We test to see whether
the variables type of birth and day or week are independent.
This is a test of _____________________________________________
We select a random sample of natural births and a random sample of
surgical births at a hospital. We test to see whether the pattern of births by
day of the week are the same for the populations of natural births and
surgical births.
This is a test of _____________________________________________
Hypotheses: Test of Independence:
Ho: The variables are independent
Ha: The variables are not independent
Hypotheses: Test of Homogeniety
Ho: The distributions of the two populations are the same.
Ha: The distributions of the two populations are not the same.
χ2 Hypothesis Test for Independence
(in a 2-way contingency table)
Draw, shade and label the graph
EXAMPLE 8:
A random sample of 1343 US adults age 25 and over were surveyed and
asked about their educational status and whether they were employed or
unemployed. Are employment status and educational status independent?
SOURCE: Based on information in the Current Population Survey Feb 2013
http://www.bls.gov/webapps/legacy/cpsatab4.htm
http://data.bls.gov/pdq/SurveyOutputServlet
Not a
high
High
OBSERVED school
School
DATA
Graduate Graduate
Employed
100
333
Unemployed
13
29
TOTAL
113
362
p-value = P(χ2 > _____) = (χ2cdf (_________, _________, _____) = ____
Some
College or
Associate
Degree
348
25
373
Bachelors
Degree
or Higher
476
19
495
TI –83, 84:χ2 cdf (lower bound, 10^99, degrees of freedom)
TOTAL
1257
86
1343
Decision:
Reason for Decision:
Conclusion:
Null Hypothesis:
Ho: __________________________________________________________
____________________________________________________________
Alternate Hypothesis:
Find the output matrix with the EXPECTED DATA and fill it in
Ha: __________________________________________________________
____________________________________________________________
Use the χ2 Test on your calculator to find the following:
Test Statistic = _____________
Probability Distribution used to calculate p-value: _____ with df = _____
Degrees of freedom = (number of rows – 1)(number of columns – 1)
EXPECTED
DATA
Employed
Unemployed
TOTAL
Not a
high
school
Graduate
High
School
Gradutate
Some
College or
Associate
Degree
Bachelors
Degree
or Higher
113
362
373
495
By hand, show the calculation that the calculator has performed to find
the expected number of
a) unemployed people with bachelor's degree or higher.
p-value = _______________
b) employed people with some college or associate degree
Page 11
TOTAL
1257
86
1343
χ2 Hypothesis Test for Independence
(in a 2-way contingency table)
Draw, shade and label the graph
EXAMPLE 9:
A store conducts a survey of a random sample of customers. The survey
asks about income level and whether the customer shops online or in the
store.
Conduct a hypothesis test to determine whether the income level
of customers and whether customers shop in the store or online are
independent.
OBSERVED DATA
Income < $50,000
$50,000-$99,999
$100,000 or higher
TOTAL
Shop in
the store
only
40
53
31
124
Shop
online
only
22
35
26
83
Shop both
in store and
online
55
94
44
193
p-value = P(χ2 > _____) = (χ2cdf (_________, _________, _____) = ____
TI –83, 84:χ2 cdf (lower bound, 10^99, degrees of freedom)
TOTAL
117
182
101
400
Decision:
Reason for Decision:
Conclusion:
Null Hypothesis:
Ho: __________________________________________________________
____________________________________________________________
Find the output matrix with the EXPECTED DATA and fill it in
Alternate Hypothesis:
Ha: __________________________________________________________
____________________________________________________________
Use the χ2 Test on your calculator to find the following:
Test Statistic = _____________
Probability Distribution used to calculate p-value: _____ with df = _____
Degrees of freedom = (number of rows – 1)(number of columns – 1)
EXPECTED DATA
Income < $50,000
$50,000-$99,999
$100,000 or higher
TOTAL
Shop in
the store
only
124
Shop
online
only
Shop both
in store
and online
83
193
By hand, show the calculation that the calculator has performed to find
the expected number of
a) people with income of <$50,000 who shop in the store only.
p-value = _______________
b) people with income of $100,000 + who shop online only
Page 12
TOTAL
117
182
101
400