Download two ways

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Stat 31, Section 1, Last Time
•
Paired Diff’s vs. Unmatched Samples
–
Compare with example
–
Showed graphic about Paired often better
•
Review of Gray Level Hypo Testing
•
Inference for Proportions
–
Confidence Intervals
–
Sample Size Calculation
Reading In Textbook
Approximate Reading for Today’s Material:
Pages 536-549, 555-566, 582-611
Approximate Reading for Next Class:
Pages 582-611,
634-667
Midterm II
Coming on Tuesday, April 10
Think about:
• Sheet of Formulas
– Again single 8 ½ x 11 sheet
– New, since now more formulas
•
•
•
•
Redoing HW…
Asking about those not understood
Midterm not cumulative
Covered Material: HW 7 - 11
Midterm II
Extra Office Hours:
Monday, 4/9,
10:00 – 12:00
12:30 – 3:00
Tuesday, 4/10,
8:30 – 10:00
11:00 – 12:00
Hypo. Tests for Proportions
Case 3:
Hypothesis Testing
General Setup:

 
H 0 : p  

 

 
H A : p  

 
Given Value
Hypo. Tests for Proportions
Assess strength of evidence by:
P-value = P{what saw or m.c. | B’dry} =
= P{observed
Problem: sd of p
ˆ
p̂ or m.c. | p =
p 1  p 
n
}
Hypo. Tests for Proportions
p 1  p 
Problem: sd of p 
ˆ
n
Solution: (different from above “best guess”
and “conservative”)
calculation is done base on:
p
Hypo. Tests for Proportions
e.g. Old Text Problem 8.16
Of 500 respondents in a Christmas tree
marketing survey, 44% had no children
at home and 56% had at least one child
at home. The corresponding figures
from the most recent census are 48%
with no children, and 52% with at least
one. Test the null hypothesis that the
telephone survey has a probability of
selecting a household with no children
that is equal to the value of the last
census. Give a Z-statistic and P-value.
Hypo. Tests for Proportions
e.g. Old Text Problem 8.16
Let p = % with no child
(worth writing down)
H 0 : p  0.48
H A : p  0.48
Hypo. Tests for Proportions
Observed p
ˆ  0.44 , from n  500
P-value =
 Pp
ˆ  0.44 or m.c. | p  0.48
ˆ  p  0.04 | p  0.48
 P p
 2  Ppˆ  0.44
Hypo. Tests for Proportions
P-value  2  Pp
ˆ  0.44
= 2 * NORMDIST(0.44,0.48,sqrt(0.48*(1-0.48)/500),true)
See Class Example 30, Part 3
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg30.xls
= 0.0734
Yes-No:
no strong evidence
Gray-level:
somewhat strong evidence
Hypo. Tests for Proportions
Z-score version:
P-value =



 P



ˆ  p  0.04
P p
pˆ  p

p1  p 
n
So Z-score is:


0.04


0.481  0.48 

500

= 1.79
Hypo. Tests for Proportions
Note also 1-sided version:
Yes-no:
is strong evidence
Gray Level:
stronger evidence
HW: 8.22a (0.0057), 8.23, interpret
from both yes-no and gray-level
viewpoints
2 Sample Proportions
In text Section 8.2
•
Skip this
•
Ideas are only slight variation of above
•
Basically mix & Match of 2 sample
ideas, and proportion methods
•
If you need it (later), pull out text
•
Covered on exams to extent it is in HW
Chapter 9: Two-Way Tables
Main idea:
Divide up populations in two ways
–
–
•
E.g. 1:
E.g. 2:
Age & Sex
Education & Income
Typical Major Question:
How do divisions relate?
Are the divisions independent?
•
–
–
Similar idea to indepe’nce in prob. Theory
Statistical Inference?
Two-Way Tables
Class Example 31, Textbook Example 9.18
Market Researchers know that background
music can influence mood and
purchasing behavior. A supermarket
compared three treatments: No music,
French accordion music and Italian
string music. Under each condition, the
researchers recorded the numbers of
bottles of French, Italian and other wine
purshased.
Two-Way Tables
Class Example 31, Textbook Example 9.18
Here is the two way table that summarizes
the data:
Wine:
French
Italian
Other
None
30
11
43
Music
French
39
1
35
Italian
30
19
35
Are the type of wine purchased, and the
background music related?
Two-Way Tables
Class Example 31:
Visualization
Class Example 31 - Counts
45
40
35
30
# Bottles 25
purchased 20
15
10
Other Wine
5
Italian Wine
0
None
French Wine
French
Italian
Music
Shows how counts are broken down by:
music type
wine type
Two-Way Tables
Big Question:
Is there a
relationship?
Class Example 31 - Counts
45
40
35
30
# Bottles 25
purchased 20
15
10
Other Wine
5
Italian Wine
0
None
French Wine
French
Note: tallest bars
French Wine  French Music
Italian Wine  Italian Music
Other Wine  No Music
Suggests there is a relationship
Music
Italian
Two-Way Tables
General Directions:
•
Can we make this precise?
•
Could it happen just by chance?
–
•
Really: how likely to be a chance effect?
Or is it statistically significant?
–
I.e. music and wine purchase are related?
Two-Way Tables
Class Example 31, a look under the hood…
Excel Analysis, Part 1:
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Notes:
•
Read data from file
•
Only appeared as column
•
Had to re-arrange
•
Better way to do this???
•
Made graphic with chart wizard
Two-Way Tables
HW:
Make 2-way bar graphs, and discuss
relationships between the divisions, for
the data in:
9.1
(younger people tend to be better
educated)
9.9
9.11
(you try these…)
Two-Way Tables
An alternate view:
Replace counts by proportions (or %-ages)
Class Example 31 (Wine & Music), Part 2
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Advantage:
May be more interpretable
Drawback:
No real difference (just rescaled)
Two-Way Tables
Testing for independence:
What is it?
From probability theory:
P{A | B} = P{A}
i.e. Chances of A, when B is known, are
same as when B is unknown
Table version of this idea?
Independence in 2-Way Tables
Recall:
P{A | B} = P{A}
Counts - proportions analog of these?
•
•
Analog of P{A}?
–
Proportions of factor A, “not knowing B”
–
Called “marginal proportions”
Analog of P{A|B}???
Independence in 2-Way Tables
Marginal proportions (or counts):
•
Sums along rows
•
Sums along columns
•
Useful to write at margins of table
•
Hence name marginal
•
Number of independent interest
•
Also nice to put total at bottom
Independence in 2-Way Tables
Marginal Counts:
Class Example 31 (Wine & Music), Part 3
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Marginals are of independent interest:
•
Other wines sold best (French second)
•
Italian music sold most wine…
•
But don’t tell whole story
–
–
E.g.Can’t see same music & wine is best…
Full table tells more than marginals
Independence in 2-Way Tables
Recall definition of independence:
P{A | B} = P{A}
Counts analog of P{A|B}???


P
A
&
B
Recall:
PA  PA | B 
PB
So equivalent condition is:
P{ A}P{B}  P{ A & B}
Independence in 2-Way Tables
Counts analog of P{A|B}???
Equivalent condition for independence is:
P{ A & B}  P{ A}  P{B}
So for counts, look for:
Table Prop’n = Row Marg’l Prop’n x Col’n Marg’l Prop’n
i.e. Entry = Product of Marginals
Independence in 2-Way Tables
Visualize Product of Marginals for:
Class Example 31 (Wine & Music), Part 4
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Shows same structure
as marginals
But not match between
music & wine
Good null hypothesis
Class Example 31 - Independent Model
0.18
0.16
0.14
0.12
0.1
# Bottles
purchased 0.08
0.06
0.04
Other Wine
0.02
Italian Wine
0
None
Music
French Wine
French
Italian
Independence in 2-Way Tables
•
Independent model appears different
•
But is it really different?
•
Or could difference be simply explained
by natural sampling variation?
•
Check for statistical significance…
Independence in 2-Way Tables
Approach:
•
Measure “distance between tables”
–
Use Chi Square Statistic
–
Has known probability distribution when
table is independent
•
Assess significance using P-value
–
Set up as: H0: Indep.
–
P-value = P{what saw or m.c. | Indep.}
HA: Dependent
Independence in 2-Way Tables
Chi-square statistic:
•
Based on:
Observed Counts (raw data), Obsi
Expected Counts (under indep.), Expi
•
X 
2

cells i
Obsi  Expi 
2
Expi
Notes:
–
Small for only random variation
–
Large for significant departure from indep.
Independence in 2-Way Tables
Chi-square statistic calculation:
X 
2

Obsi  Expi 
cells i
2
Expi
Class example 31, Part 5:
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
–
Calculate term by term
–
Then sum
–
Is X2 = 18.3
“big” or “small”?
Independence in 2-Way Tables
H0 distribution of the X2 statistic:
“Chi Squared” (another Greek letter  )
2
Parameter: “degrees of freedom”
(similar to T distribution)
Excel Computation:
–
CHIDIST (given cutoff, find area = prob.)
–
CHIINV
(given prob = area, find cutoff)
Independence in 2-Way Tables
Explore the 
2
distribution:
Applet from Webster West (U. So. Carolina)
http://www.stat.sc.edu/~west/applets/chisqdemo.html
•
Right Skewed Distribution
•
Nearly Gaussian for more d.f.
Independence in 2-Way Tables
For test of independence, use:
degrees of freedom =
= (#rows – 1) x (#cols – 1)
E.g. Wine and Music:
d.f. = (3 – 1) x (3 – 1) = 4
Independence in 2-Way Tables
E.g. Wine and Music:
P-value = P{Observed X2 or m.c. | Indep.} =
= P{X2 = 18.3 of m.c. | Indep.} =
= P{X2 >= 18.3 | d.f. = 4} =
= 0.0011
Also see Class Example 31, Part 5
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg31.xls
Independence in 2-Way Tables
E.g. Wine and Music:
P-value = 0.001
Yes-No: Very strong evidence against
independence, conclude music has a
statistically significant effect
Gray-Level:
evidence
Also very strong
Independence in 2-Way Tables
Excel shortcut:
CHITEST
•
Avoids the (obs-exp)^2 / exp calculat’n
•
Automatically computes d.f.
•
Returns P-value