Download Chapter 26: Comparing Counts - Masin

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 26: Comparing Counts
AP Statistics
Comparing Counts
• In this chapter, we will be performing
hypothesis tests on categorical data
• In previous chapters, the data has been
quantitative (normal distributions and
student’s t-distributions)
Chi-Square Tests
We will be looking at three different types of
hypothesis tests that deal with categorical
data.
1. Chi-square Test for Goodness of Fit
2. Chi-square Test for Homogeneity
3. Chi-square Test for
Independence/Association
 Test for Goodness of Fit
2
Compares the observed sample distribution with
the population distribution. Generally, we are
testing how well the observations “fit” what we
expect.
df  n  1; where n is the number of categories
 Test for Homogeneit y
2
Compares more than two groups. This is also
referred to as a two-way table or a
contingency table, as the data are in a table or
matrix.
df  r  1c  1; where r is the number of rows and
c is the number of columns
 Test for Independen ce/Associa tion
2
Tests the null hypothesis that there is no relationship
between two categorical variables from a simple
random sample, with each individual classified
according to both of the categorical variables.
df  n  1; where n is the number of categories
Chi-Square Model
The Chi-Square Model is part of a family of
distributions (Normal, Student’s t ).
Its shape is determined by one parameter-the
degrees of freedom (like t-model)
Chi-square Model
• With only a few degrees of freedom, the
model is strongly skewed right.
• As the degrees of freedom increase, the
model gets closer to being symmetric, but will
never get there.
• Will ALWAYS have a longer tail on the right.
Chi-square Model
Chi-squared Model
Chi-square Model
•
•
•
•
The mode is at df—2
The mean is at df
Always “starts” at zero
Only takes on positive
values
• Only used for tests (no
confidence intervals
here)
Chi-square statistic
• Functions just like a z-score or a t-score
• You will calculate a P-value based on the Chisquare statistic
• Will always represent the value in a one-tailed
test
 
2
observed  exp ected 
2
exp ected
Chi-square model/statistic
df=5
df=11

2
P  value
Hypotheses
• Null: Usually written in words
– H0: Ages are uniformly distributed in the school
– HA: Ages are not uniformly distributed in the school
• Even though the alternative appears to be twotailed, chi-square tests are always one-sided
– Only testing if statistic is “too large”.
• Alternative has no direction. All we know is that
it doesn’t fit. Hence, the “goodness of fit”.
Assumptions/Conditions
Counted Data Assumption
make sure all data are “counts”, not percents or proportions.
Independence Assumption
are individuals counted in cells sampled independently from some
population?
Randomization Condition
Sample Size Assumption
Expected Cell Frequency Condition
at least 5 individuals in each cell
Logic
• We assume the probability (expected) model is correct.
Our test will assess whether the observed results
(statistics) are consistent with that model.
– Ask: Are the differences between the observed and
hypothesized values just natural sampling variability or is it
something else? (significant difference?)
• To assess our model, we compute the Chi-square
statistic and find the corresponding P-value from the
chi-square model (with a certain degree of freedom).
• We interpret the P-value and if it is greater than our α,
we fail to reject the null and if it is less than our α, we
reject our null—accepting our alternative.
Goodness of Fit Example
The other day I purchased a 40-pound bag of mixed
nuts for a math department party. The bag said
that the mix consisted of, by weight, 40%
cashews, 15% Brazil nuts, 20% almonds and 25%
peanuts. When the department came over, we
randomly picked out 20 nuts to test the claim by
the company. In our sample, we found 10
cashews, 8 Brazil nuts, 10 almonds and 12
peanuts. Based on this sample, do you feel as if
the company is being misleading?
Related documents