Download 2 + - Binus Repository

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Mata kuliah
Tahun
: A0392 - Statistik Ekonomi
: 2010
Pertemuan 11
Uji kebaikan Suai dan Uji
Independen
1
Outline Materi :
 Uji Kebaikan Suai
 Uji Kesamaan Beberapa Proporsi
 Uji Independen Dua Faktor Kualitatif
2
Multinomial Experiments
and Contingency Tables
10-1 Overview
10-2 Multinomial Experiments:
Goodness-of-fit
10-3 Contingency Tables:
Independence and Homogeneity
3
Overview
We focus on analysis of categorical (qualitative
or attribute) data that can be separated into
different categories (often called cells).
Use the 2 (chi-square) test statistic (Table A-4).
The goodness-of-fit test uses a one-way
frequency table (single row or column).
The contingency table uses a two-way
frequency table (two or more rows and
columns).
4
Definition
Multinomial Experiment
This is an experiment that meets the following
conditions:
1. The number of trials is fixed.
2. The trials are independent.
3. All outcomes of each trial must be classified
into exactly one of several different
categories.
4. The probabilities for the different categories
remain constant for each trial.
5
Example: Last Digit Analysis
In 2001, Barry Bonds hit 73 home runs. Table 10-2
summarizes the last digit of those home run distances.
Verify that the four conditions of a multinomial experiment
are satisfied.
6
Example: Last Digit Analysis
In 2001, Barry Bonds hit 73 home runs. Table 10-2
summarizes the last digit of those home run distances.
Verify that the four conditions of a multinomial experiment
are satisfied.
1. The number of trials (last digits) is the fixed number 73.
2. The trials are independent, because the last digit of the
length of a home run does not affect the last digit of the
length of any other home run.
3. Each outcome (last digit) is classified into exactly 1 of
10 different categories. The categories are 0, 1, … , 9.
4. Finally, if we assume that the home run distances are
measured, the last digits should be equally likely, so that
each possible digit has a probability of 1/10.
7
Definition
Goodness-of-fit test
A goodness-of-fit test is used to test
the hypothesis that an observed
frequency distribution fits (or
conforms to) some claimed
distribution.
8
Goodness-of-Fit Test
Notation
0
represents the observed frequency of an outcome
E
represents the expected frequency of an outcome
k
represents the number of different categories or
outcomes
n
represents the total number of trials
9
Expected Frequencies
If all expected frequencies are equal:
E=
n
k
the sum of all observed frequencies divided
by the number of categories
10
Expected Frequencies
If all expected frequencies are not all equal:
E=np
each expected frequency is found by multiplying
the sum of all observed frequencies by the
probability for the category
11
Goodness-of-fit Test in
Multinomial Experiments
Test Statistic
 =
2
(O – E)2
E
Critical Values
1. Found in Table A-4 using k – 1 degrees of
freedom where k = number of categories
2. Goodness-of-fit hypothesis tests are always
right-tailed.
12
A close agreement between observed
and expected values will lead to a small
value of 2 and a large P-value.
A large disagreement between observed
and expected values will lead to a large
value of 2 and a small P-value.
A significantly large value of 2 will cause a
rejection of the null hypothesis of no
difference between the observed and the
expected.
13
Figure 10-3 Relationships Among Components
in Goodness-of-Fit Hypothesis Test
14
Example: Last Digit Analysis
In 2001, Barry Bonds hit 73 home runs. Table 10-2
summarizes the last digit of those home run distances.
Test the claim that the digits do not occur with the same
frequency.
H0: p0 = p1 = = p9
H1: At least one of the probabilities is different from the others.
 = 0.05
k–1=9
2.05,9 = 16.919
15
Example: Last Digit Analysis
In 2001, Barry Bonds hit 73 home runs. Table 10-2
summarizes the last digit of those home run distances.
Test the claim that the digits do not occur with the same
frequency.
16
Example: Last Digit Analysis
In 2001, Barry Bonds hit 73 home runs. Table 10-2
summarizes the last digit of those home run distances.
Test the claim that the digits do not occur with the same
frequency.
The test statistic is 2 = 251.521. Since the critical value is
16.919, we reject the null hypothesis.
There is sufficient evidence to support the claim that the last
digits do not occur with the same relative frequency.
17
Example: Last Digit Analysis
In 2001, Barry Bonds hit 73 home runs. Table 10-2
summarizes the last digit of those home run distances.
Test the claim that the digits do not occur with the same
frequency.
18
Example: Detecting Fraud
In the Chapter Problem, it was noted that statistics can be
used to detect fraud. Table 10-1 list the percentages for
leading digits. Test the claim that there is a significant
discrepancy between the leading digits expected from
Benford’s Law and the leading digits from the 784 checks.
H0: p1 = 0.301, p2 = 0.176, p3 = 0.125, p4 = 0.097, p5 = 0.079, p6 =
0.067, p7 = 0.058, p8 = 0.051 and p9 = 0.046
H1: At least one of the proportions is different from the claimed
values.
 = 0.01
k – 1 =8
2.01,8 = 20.090
19
Example: Detecting Fraud
In the Chapter Problem, it was noted that statistics can be
used to detect fraud. Table 10-1 list the percentages for
leading digits. Test the claim that there is a significant
discrepancy between the leading digits expected from
Benford’s Law and the leading digits from the 784 checks.
20
Example: Detecting Fraud
In the Chapter Problem, it was noted that statistics can be
used to detect fraud. Table 10-1 list the percentages for
leading digits. Test the claim that there is a significant
discrepancy between the leading digits expected from
Benford’s Law and the leading digits from the 784 checks.
The test statistic is 2 = 3650.251. Since the critical
value is 20.090, we reject the null hypothesis.
There is sufficient evidence to reject the null
hypothesis.
21
Example: Detecting Fraud
In the Chapter Problem, it was noted that statistics can be
used to detect fraud. Table 10-1 list the percentages for
leading digits. Test the claim that there is a significant
discrepancy between the leading digits expected from
Benford’s Law and the leading digits from the 784 checks.
22
Example: Detecting Fraud
In the Chapter Problem, it was noted that statistics can be
used to detect fraud. Table 10-1 list the percentages for
leading digits. Test the claim that there is a significant
discrepancy between the leading digits expected from
Benford’s Law and the leading digits from the 784 checks.
23
Definition
 Contingency Table (or two-way frequency table)
A contingency table is a table in which
frequencies correspond to two variables.
(One variable is used to categorize rows,
and a second variable is used to
categorize columns.)
Contingency tables have at least two
rows and at least two columns.
24
25
Definition
 Test of Independence
This method tests the null
hypothesis that the row variable and
column variable in a contingency
table are not related. (The null
hypothesis is the statement that the
row and column variables are
independent.)
26
Assumptions
1.
The sample data are randomly selected.
2.
The null hypothesis H0 is the statement that
the row and column variables
are
independent; the alternative
hypothesis H1 is the statement that the row and
column variables are dependent.
3.
For every cell in the contingency table, the
expected frequency E is at least 5. (There is
no requirement that every observed
frequency must be at least 5.)
27
Test of Independence
Test Statistic
 =
2
(O – E)2
E
Critical Values
1. Found in Table A-4 using
degrees of freedom = (r – 1)(c – 1)
r is the number of rows and c is the number of columns
2. Tests of Independence are always right-tailed.
28
E=
(row total) (column total)
(grand total)
Total number of all observed frequencies
in the table
29
Tests of Independence
H0: The row variable is independent of the
column variable
H1: The row variable is dependent (related to)
the column variable
This procedure cannot be used to establish a
direct cause-and-effect link between variables in
question.
Dependence means only there is a relationship
between the two variables.
30
Expected Frequency for Contingency Tables
E=
grand total
n
•
•
row total
grand total
•
column total
grand total
p
(probability of a cell)
E=
(row total) (column total)
(grand total)
31
Observed and Expected Frequencies
Survived
Died
Total
Men
Women
332
318
29
27
706
1360
104
35
18
1517
1692
422
64
45
2223
Boys Girls
Total
We will use the mortality table from the Titanic to find expected
frequencies. For the upper left hand cell, we find:
E = (706)(1692)
2223
= 537.360
32
Observed and Expected Frequencies
Survived
Died
Women
Men
318
332
537.360
Boys Girls
29
27
Total
706
1360
104
35
18
1517
1692
422
64
45
2223
Total
Find the expected frequency for the lower left hand cell, assuming
independence between the row variable and the column variable.
E = (1517)(1692) = 1154.640
2223
33
Observed and Expected Frequencies
Survived
Died
Women Boys Girls
Men
29
318
27
332
537.360 134.022 20.326 14.291
Total
706
35
104
18
287.978 43.674 30.709
1517
64
2223
1360
1154.64
Total
1692
422
45
To interpret this result for the lower left hand cell, we can
say that although 1360 men actually died, we would have
expected 1154.64 men to die if survivablility is independent
of whether the person is a man, woman, boy, or girl.
34
Example: Using a 0.05 significance level, test the claim
that when the Titanic sank, whether someone survived or
died is independent of whether that person is a man,
woman, boy, or girl.
H0: Whether a person survived is independent of whether the person
is a man, woman, boy, or girl.
H1: Surviving the Titanic and being a man, woman, boy, or girl are
dependent.
35
Example: Using a 0.05 significance level, test the claim
that when the Titanic sank, whether someone survived or
died is independent of whether that person is a man,
woman, boy, or girl.
2= (332–537.36)2 + (318–132.022)2 + (29–20.326)2 + (27–14.291)2
14.291
537.36
134.022
20.326
+ (1360–1154.64)2 + (104–287.978)2 + (35–43.674)2 + (18–30.709)2
30.709
1154.64
43.674
287.978
2=78.481 + 252.555 + 3.702+11.302+36.525+117.536+1.723+5.260
= 507.084
36
Example: Using a 0.05 significance level, test the claim
that when the Titanic sank, whether someone survived or
died is independent of whether that person is a man,
woman, boy, or girl.
The number of degrees of freedom are (r–1)(c–1)= (2–1)(4–1)=3.
2.05,3 = 7.815. We reject the null hypothesis.
Survival and gender are dependent.
37
Test Statistic
2 = 507.084
with  = 0.05 and (r
freedom
Critical Value
– 1) (c– 1) = (2 – 1) (4 – 1) = 3 degrees of
2 = 7.815 (from Table A-4)
38
Relationships Among Components
in X2 Test of Independence
Figure 10-8
39
Definition
 Test of Homogeneity
In a test of homogeneity, we test the
claim that different populations have the
same proportions of some characteristics.
40
How to distinguish between
a test of homogeneity
and a test for independence:
Were predetermined sample sizes
used for different populations (test of
homogeneity), or was one big sample
drawn so both row and column totals
were determined randomly (test of
independence)?
41
Example: Using Table 10-7 as seen below, with a 0.05
significance level, test the effect of pollster gender on
survey responses by men.
42
Example: Using Table 10-7 as seen below, with a 0.05
significance level, test the effect of pollster gender on
survey responses by men.
H0: The proportions of agree/disagree responses are the same for the
subjects interviewed by men and the subjects interviewed by women.
H1: The proportions are different.
43
Example: Using Table 10-7 as seen below, with a 0.05
significance level, test the effect of pollster gender on
survey responses by men.
44
Example: Using Table 10-7 as seen below, with a 0.05
significance level, test the effect of pollster gender on
survey responses by men.
The Minitab display includes the test statistic of 2 = 6.529
and a P-value of 0.011. Using the P-value approach, we
reject the null hypothesis of equal(homogeneous)
proportions(because the P-value of 0.011 is less than
0.05.
There is sufficient evidence to reject the claim of equal
proportions.
45