Download Math 2200 Chapter 11 Power Points

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hardy–Weinberg principle wikipedia , lookup

Taylor's law wikipedia , lookup

Corecursion wikipedia , lookup

Psychometrics wikipedia , lookup

Transcript
CHAPTER 11 THE CHI-SQUARE DISTRIBUTION
This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia
Highlands College.
11.1 FACTS ABOUT THE CHI-SQUARE DISTRIBUTION
For Qualitative data (hint: data in categories)
CHI-SQUARE NOTATION
The notation for the chi-square distribution is:
χ~χ²𝑑𝑓
where df = degrees of freedom which depends on how
chi-square is being used.
If you want to practice calculating chi square probabilities
then use df= (number of categories – 1).
CHI-SQUARE
For the Chi-Square distribution,
the population mean is μ=df
the population standard deviation is σ =
The random variable is shown as χ2
2(𝑑𝑓)
RULES FOR CHI-SQUARE DISTRIBUTION
1. The curve is nonsymmetrical and skewed to the right.
2. There is a different chi-square curve for each df.
3. The test statistic for any test is always greater than or
equal to zero.
RULES FOR CHI-SQUARE DISTRIBUTION
4. When df > 90, the chi-square curve approximates the
normal distribution.
For a chi-square where the df = 1000,
 the mean, μ = df = 1,000 and the standard deviation, σ=
44.7.
2(1,000)=
Therefore, X~N(1,000, 44.7), approximately.
5. The mean ,μ, is located just to the right of the
peak.
GOODNESS-OF-FIT TEST
In this type of hypothesis test, you determine whether the data "fit" a particular
distribution or not.
For example, you may suspect you run known data fit a binomial distribution. You use
a chi-square test (meaning the distribution for the hypothesis test is chi square) to
determine if there is a fit or not.
The null and the alternative hypotheses for this test may be written in sentences or
may be stated as equations or inequalities.
CHI SQUARE GOF TEST STATISTIC
The test statistic for a goodness-of-fit test is:
(𝑶−𝑬)²
𝒌
𝑬
where: • O=observed values(data)
• E=expected values(from theory)
• k= the number of different data cells or categories
The observed values are the data values and the expected values are the values you
would expect to get if the null hypothesis were true.
The expected value for each cell needs to be at least five in order for you to use this
test.
CHI-SQUARE GOF TEST
The number of degrees of freedom is
df= (number of categories – 1).
The goodness-of-fit test is almost always right-tailed. If the
observed values and the corresponding expected values are not
close to each other, then the test statistic can get very large and
will be way out in the right tail of the chi-square curve.
EXAMPLE OF CHI SQUARE GOF TEST
Employers want to know which days of the week employees are
absent in a five-day work week. Most employers would like to
believe that employees are absent equally during the week.
Suppose a random sample of 60 managers were asked on which
day of the week they had the highest number of employee
absences. The results were distributed as in Table 11.6. For the
population of employees, do the days for the highest number of
absences occur with equal frequencies during a five-day work
week? Test at a 5% significance level.
EXAMPLE OF CHI SQUARE GOF TEST
Below is the observed amounts for each day
Monday
Number 15
of
absences
Tuesday
12
Wednesd Thursday Friday
ay
9
9
15
HYPOTHESES
The null and alternative hypotheses are:
• H0: The absent days occur with equal frequencies, that is,
they fit a uniform distribution.
• Ha: The absent days occur with unequal frequencies, that
is, they do not fit a uniform distribution.
SOLVE FOR TEST STATISTICS #1
Find expected value (find the total number of absences
and then divide by the number of categories)
Day
Monday
Tuesday
Wednesday
Thursday
Friday
Observed
15
12
9
9
15
Expected
60 / 5 = 12
60 / 5 = 12
60 / 5 = 12
60 / 5 = 12
60 / 5 = 12
SOLVE FOR TEST STATISTICS #2
Next we subtract the expected from the observed.
Day
Monday
Tuesday
Wednesday
Thursday
Friday
Observed
15
12
9
9
15
Expected
60 / 5 = 12
60 / 5 = 12
60 / 5 = 12
60 / 5 = 12
60 / 5 = 12
Observed-Expected
15 – 12 = 3
12 –12 = 0
9 – 12 = - 3
9 – 12 = - 3
15 – 12 = 3
SOLVE FOR TEST STATISTICS #3
Next we square the results from the observed – expected.
Day
Mon
Tues
Wed
Thurs
Fri
Obs.
15
12
9
9
15
Exp.
12
12
12
12
12
Obs-Exp
15 – 12 = 3
12 –12 = 0
9 – 12 = - 3
9 – 12 = - 3
15 – 12 = 3
(Obs-Exp)²
(3)² = 9
(0)² = 0
(-3)² = 9
(-3)² = 9
(3)² = 9
SOLVE FOR TEST STATISTICS #4
Next we divide the expected value from the (obs-exp)²
Day
Obs Exp
Mon
Tues
Wed
Thurs
Fri
15
12
9
9
15
12
12
12
12
12
ObsExp
3
0
-3
-3
3
(ObsExp)²
9
0
9
9
9
(Obs-Exp)²/Exp
9/12 = 0.75
0/12 = 0
9/12 = 0.75
9/12 = 0.75
9/12 = 0.75
SOLVE FOR TEST STATISTICS #5
To get the Chi-Square 𝑥 2 test statistic, you add the ObsExp)²/Exp together
0.75 + 0 +
0.75 + 0.75 +
0.75 = 3.00
So the 𝑥 2
test statistic is
3.00
Day
Mon
Tues
Wed
Thurs
Fri
(Obs-Exp)²/Exp
9/12 = 0.75
0/12 = 0
9/12 = 0.75
9/12 = 0.75
9/12 = 0.75
ANSWER
χ² test statistic is 3
d.f. is 5-1 = 4
To get p-value,
Press2nd DISTR. Arrow down to χ2cdf. Press ENTER.
Enter(3,10^99,4).Rounded to four decimal places, you should see 0.5578, which is
the p-value.
Since p (0.5578) is greater than α (0.05), then you decide not to reject the null
hypothesis
BUT WHAT DOES IT MEAN?
If we decide not to reject the null hypothesis, that means we found the null
hypothesis to be true.
So we need to conclude it properly.
At the 5% level of significance, there is not sufficient evidence to conclude that the
absent days do not occur with equal frequencies.
o We stated the level of significance
oSince we failed to reject the null hypothesis, we base our conclusion on the
alternative hypothesis.
DIFFERENT PERCENTAGES
Some Chi-Squared will want the data to be tested at different percentages based on
the different categories.
To determine the expected values, you will get the sample size (add all the observed
values). Based on the particular categories’ percentage, you will cover the
percentage to a decimal and multiply the decimal by the sample size.
EX: For the last example, the sample size was 60. What if the manager believed that
absences occurred 40% on Monday?
60 x .40 = 24 is the expected number for Monday.
BUT we can use a website to help us so we don’t have to do it by hand.
CHI-SQUARE GOODNESS OF FIT
Using Internet Website
WEBSITE
http://www.socscistatistics.com/tests/goodnessoffit/Default2.aspx
EXAMPLE: “EQUAL PROPORTIONS”
WHICH FLAVOR OF SODA IS PREFERRED?
Claim: There is no preference for flavors
Let a = 0.05
A sample of 100 people provide the data in the table below:
Cherry
Strawberry Orange
Lime
Grape
32
28
14
10
16
STEP #1 ENTER THE QUALITATIVE CATEGORIES
From the example above:
SODA FLAVOR
LIKES
Cherry
32
Strawberry
28
Orange
16
Lime
14
Grape
10
These are the
qualitative
categories
Enter the categories
STEP #2 & #3
Click Next
Select “Frequencies”
STEP #4 & #5
Enter “Observed Values” from the
given table
Calculate the expected value for each
category
Expected value = (sample
size)(proportion)
E = np
For this example: Sample size, n = 100
“Equal” proportions =
1
1
5
So, E = (100)( )= 20 expected likes per
5
soda flavor
STEP #6 & #7
Enter “Expected Value”
Select significance level given in
problem (example)
STEP #8 & #9
Click Calculate Chi^2
Results displayed on screen
EXAMPLE
#2
SPECIFIED
PROPORTIONS
A statistics teacher claims that, on the average, 20% of her students get a grade of A, 35% get a B,
25% get a C, 10% get a D, and 10% get an F. The grades of a random sample of 100 students
were recorded. Test the claim that the grades follow the distribution claimed by teacher. Use a =
0.05 The following table presents the results:
A
B
C
D
F
29
42
20
5
4
STEP #1 ENTER THE QUALITATIVE CATEGORIES
From the example above:
Grade
Number of Students
A
29
B
42
C
20
D
5
F
4
These are the
qualitative
categories
Enter the categories
STEP #2 & #3
Click Next
Select “Frequencies”
STEP #4 & #5
Enter “Observed Values” from the
given table
Calculate the expected value for each
category
Expected value = (sample
size)(proportion)
E = np
The expected value will be different for
each category.
STEP #5
Grade
Total number of
students in SAMPLE
(not observed
values)
Percentage (stated
in problem)
Total number in
Sample *
percentage
EXPECTED VALUE
A
100
20%
(100)(0.20)
20
B
100
35%
(100)(0.35)
35
C
100
25%
(100)(0.25)
25
D
100
10%
(100)(0.10)
10
F
100
10%
(100)(0.10)
10
STEP #6 & #7
Enter “Expected Value”
Select significance level given in
problem (example)
STEP #8 & #9
Click Calculate Chi^2
Results displayed on screen