Download The Argument

Document related concepts
no text concepts found
Transcript
Bivariate
Analyses
Bivariate Procedures I Overview
 Chi-square test
 T-test
 Correlation
Chi-Square Test
 Relationships between nominal variables
 Types:
 2x2 chi-square
 Gender by Political Party
 2x3 chi-square
 Gender by Dosage (Hi vs. Med. Vs. Low)
Starting Point:
The Crosstab Table
 Example:
Gender (IV)
Males
Females
Democrat
1
20
Republican
10
2
Total
11
22
Party (DV)
Column Percentages
Gender (IV)
Males
Females
Democrat
9%
91%
Republican
91%
9%
Total
100%
100%
Party (DV)
Row Percentages
Gender (IV)
Males
Females
Total
Democrat
5%
95%
100%
Republican
83%
17%
100%
Party (DV)
Full Crosstab Table
Males
Democrat
Republican
Total
1
Females Total
20
21
5%
95%
9%
91%
10
2
12
83%
17%
91%
9%
11
22
33%
64%
36%
33
67%
100%
Research Question and
Hypothesis
 Research Question:
 Is gender related to party affiliation?
 Hypothesis:
 Men are more likely than women to be Republicans
 Null hypothesis:
 There is no relation between gender and party
Testing the Hypothesis
 Eyeballing the table:
 Seems to be a relationship
 Is it significant?
 Or, could it be just a chance finding?
 Logic:
 Is the finding different enough from the null?
 Chi-square answers this question
 What factors would it take into account?
Factors Taken into Consideration
 Factors:
 1. Magnitude of the difference
 2. Sample size
 Biased coin example
 Magnitude of difference:
 60% heads vs. 99% heads
 Sample size:
 10 flips vs. 100 flips vs. 1 million flips
Chi-square
 Chi-Square starts with the frequencies:
 Compare observed frequencies with frequencies we expect
under the null hypothesis
What would the Frequencies be if
there was No Relationship?
Males
Females
Democrat
21
Republican
12
Total
11
22
Total
33
Expected Frequencies (Null)
Males
Females
Democrat
7
14
21
Republican
4
8
12
Total
11
22
Total
33
Comparing the Observed and
Expected Cell Frequencies
 Formula:
Calculating the Expected
Frequency
 Simple formula for expected cell frequencies
 Row total x column total / Total N
 21 x 11 / 33 = 7
 21 x 22 / 33 = 14
 12 x 11 / 33 = 4
 12 x 22 / 33 = 8
Observed and Expected Cell
Frequencies
Males
Females
Democrat
1 7
20 14
21
Republican
10 4
2 8
12
Total
11
22
Total
33
Plugging into the Formula
O-E
Square Square/E
Cell A = 1-7 = -6 36
36/7 = 5.1
Cell B = 20-14 = 6
36
36/14 = 2.6
Cell C = 10-4 = 636
36/4 = 9
Cell D = 2-8 = -6 36
36/8 = 4.5
Sum = 21.2
Chi-square = 21.2
Is the chi-square significant?
 Significance of the chi-square:
 Great differences between observed and expected lead to
bigger chi-square
 How big does it have to be for significance?
 Depends on the “degrees of freedom”
 Formula for degrees of freedom:
(Rows – 1) x (Columns – 1)
Chi-square Degrees of Freedom
 2 x 2 chi-square = 1
 3x3=?
 4x3=?
Chi-square Critical Values
df
P = 0.05
P = 0.01
P = 0.001
1
3.84
6.64
10.83
2
5.99
9.21
13.82
3
7.82
11.35
16.27
4
9.49
13.28
18.47
5
11.07
15.09
20.52
6
12.59
16.81
22.46
7
14.07
18.48
24.32
8
15.51
20.09
26.13
9
16.92
21.67
27.88
10
18.31
23.21
29.59
* If chi-square is > than critical value, relationship is significant
Chi-Square Computer Printout
CONSERV * SEX OF RESPONDENT Crosstabulation
CONSERV
.00
1.00
Total
Count
% within CONSERV
% within SEX OF
RESPONDENT
Count
% within CONSERV
% within SEX OF
RESPONDENT
Count
% within CONSERV
% within SEX OF
RESPONDENT
SEX OF
RESPONDENT
1.00
2.00
1274
1583
44.6%
55.4%
Total
2857
100.0%
86.4%
84.4%
85.3%
201
40.8%
292
59.2%
493
100.0%
13.6%
15.6%
14.7%
1475
44.0%
1875
56.0%
3350
100.0%
100.0%
100.0%
100.0%
Chi-Square Computer Printout
Chi-Square Tests
Pears on Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Ass ociation
N of Valid Cas es
Value
2.492b
2.339
2.504
2.491
df
1
1
1
1
Asymp. Sig.
(2-s ided)
.114
.126
.114
Exact Sig.
(2-s ided)
Exact Sig.
(1-s ided)
.116
.063
.115
3350
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count les s than 5. The minimum expected count is
217.07.
Multiple Chi-square
 Exact same procedure as 2 variable X2
 Used for more than 2 variables
 E.g., 2 x 2 x 2 X2
 Gender x Hair color x eye color
Multiple chi-square example
Agree with "police should use any force necessary" * Believe in god * SEX OF RESPONDENT Crosstabulation
SEX OF RESPONDENT
1.00
Agree with "police s hould
use any force necess ary"
.00
1.00
Total
2.00
Agree with "police s hould
use any force necess ary"
.00
1.00
Total
Count
% within Agree with
"police s hould us e any
force necess ary"
% within Believe in god
Count
% within Agree with
"police s hould us e any
force necess ary"
% within Believe in god
Count
% within Agree with
"police s hould us e any
force necess ary"
% within Believe in god
Count
% within Agree with
"police s hould us e any
force necess ary"
% within Believe in god
Count
% within Agree with
"police s hould us e any
force necess ary"
% within Believe in god
Count
% within Agree with
"police s hould us e any
force necess ary"
% within Believe in god
Believe in god
.00
1.00
76
548
Total
624
12.2%
87.8%
100.0%
53.5%
66
42.2%
750
43.3%
816
8.1%
91.9%
100.0%
46.5%
142
57.8%
1298
56.7%
1440
9.9%
90.1%
100.0%
100.0%
35
100.0%
877
100.0%
912
3.8%
96.2%
100.0%
48.6%
37
49.4%
899
49.4%
936
4.0%
96.0%
100.0%
51.4%
72
50.6%
1776
50.6%
1848
3.9%
96.1%
100.0%
100.0%
100.0%
100.0%
Multiple chi-square example
Chi-Square Tests
SEX OF RESPONDENT
1.00
2.00
Pears on Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Tes t
Linear-by-Linear
Ass ociation
N of Valid Cas es
Pears on Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Tes t
Linear-by-Linear
Ass ociation
N of Valid Cas es
Value
6.659b
6.206
6.593
df
1
1
1
Asymp. Sig.
(2-s ided)
.010
.013
.010
6.654
1
.010
1440
.016c
.000
.016
1
1
1
.898
.994
.898
.016
1
Exact Sig.
(2-s ided)
Exact Sig.
(1-s ided)
.012
.007
.905
.497
.898
1848
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 61.53.
c. 0 cells (.0%) have expected count less than 5. The minimum expected count is 35.53.
The T-test
 Groups T-test
 Comparing the means of two nominal groups
 E.g., Gender and IQ
 E.g., Experimental vs. Control group
 Pairs T-test
 Comparing the means of two variables
 Comparing the mean of a variable at two points in time
Logic of the T-test
 A T-test considers three things:
 1. The group means
 2. The dispersion of individual scores around the mean for
each group (sd)
 3. The size of the groups
Difference in the Means
 The farther apart the means are:
 The more confident we are that the two group means are
different
 Distance between the means goes in the numerator of the
t-test formula
Why Dispersion Matters
Small variances
Large variances
Size of the Groups
 Larger groups mean that we are more confident in the
group means
 IQ example:
 Women: mean = 103
 Men: mean = 97
 If our sample was 5 men and 5 women, we are not that
confident
 If our sample was 5 million men and 5 million women, we are
much more confident
The four t-test formulae
 1. Matched samples with unequal variances
 2. Matched samples with equal variances
 3. Independent samples with unequal variances
 4. Independent samples with equal variances
All four formulae have the same
 Numerator
 X1 - X2 (group one mean - group two mean)
 What differentiates the four formulae is their denominator
 denominator is “standard error of the difference of the means”
 each formula has a different standard error
Independent sample with unequal
variances formula
 Standard error formula (denominator):
T-test Value
Look up the T-value in a T-table (use absolute value )
First determine the degrees of freedom
ex. df = (N1 - 1) + (N2 - 1)
40 + 30 = 70
For 70 df at the .05 level =1.67
ex. 5.91 > 1.67: Reject the null
(means are different)
Groups t-test printout example
Group Statistics
MEN ARE BETTER
LEADERS THAN WOMEN
SEX OF RESPONDENT
1.00
2.00
N
1461
1856
Mean
3.1485
2.1999
Std. Deviation
1.50678
1.34842
Std. Error
Mean
.03942
.03130
Inde pende nt S ample s Test
Leven e's T est for
E quality o f Va riances
F
ME N ARE B ET T ER
E qual vari ances
LE ADERS THAN WOM EN assum ed
E qual vari ances
not assumed
27.8 70
S ig.
.000
t-test for Equ ality of M eans
t
df
Mea n
S ig. (2-t ailed) Differen ce
S td. E rror
Differen ce
95% Co nfiden ce
Inte rval of th e
Differen ce
Lower
Upper
19.0 96
331 5
.000
.948 6
.049 68
.851 24
1.04 604
18.8 46
295 6.316
.000
.948 6
.050 34
.849 94
1.04 733
Pairs t-test example
Paired Samples Statistics
Mean
Pair
1
IN FAVOR OF
LEGALIZING SAME
SEX MARRIAGE
IN FAVOR OF
DEATH PENALTY
N
Std. Error
Mean
Std. Deviation
2.3909
3305
1.74692
.03039
4.4617
3305
1.67497
.02914
Paired Sam ples Tes t
Paired Dif f erences
Mean
Pair
1
IN FA VOR OF
LEGALIZING SA ME SEX
MA RRIAGE - IN FAVOR
OF DEA TH PENALTY
-2.0708
Std. Deviation
2.39422
Std. Error
Mean
.04165
95% Conf idence
Interval of the
Diff erence
Low er
Upper
-2.1525
-1.9891
t
-49.723
df
3304
Sig. (2-tailed)
.000
Pearson Correlation Coefficient (r )
 Characteristics of correlational relationships:
 1. Strength
 2. Significance
 3. Directionality
 4. Curvilinearity
Strength of Correlation:
 Strong, weak and non-relationships
 Nature of such relations can be observed in scatter diagrams
 Scatter diagram
 One variable on x axis and the other on the y-axis of a graph
 Plot each case according to its x and y values
Scatterplot: Strong relationship
B
O
O
K
R
E
A
D
I
N
G
Years of Education
Scatterplot: Weak relationship
I
N
C
O
M
E
Years of Education
Scatterplot: No relationship
S
P
O
R
T
S
I
N
T
E
R
E
S
T
Years of Education
Strength increases…
 As the points more closely conform to a straight line
 Drawing the best fitting line between the points:
 “the regression line”
 Minimizes the distance of the points from the line:
 “least squares”
 Minimizing the deviations from the line
Significance of the relationship
 Whether we are confident that an observed relationship is
“real” or due to chance
 What is the likelihood of getting results like this if the null
hypothesis were true?
 Compare observed results to expected under the null
 If less than 5% chance, reject the null hypothesis
Directionality of the relationship
 Correlational relationship can be positive or negative
 Positive relationship
 High scores on variable X are associated with high scores on
variable Y
 Negative relationship
 High scores on variable X are associated with low scores on
variable Y
Positive relationship example
B
O
O
K
R
E
A
D
I
N
G
Years of Education
Negative relationship example
R
A
C
I
A
L
P
R
E
J
U
D
I
C
E
Years of Education
Curvilinear relationships
 Positive and negative relationships are “straight-line” or
“linear” relationships
 Relationships can also be strong and curvilinear too
 Points conform to a curved line
Curvilinear relationship example
F
A
M
I
L
Y
S
I
Z
E
SES
Curvilinear relationships
 Linear statistics (e.g. correlation coefficient, regression)
can mask a significant curvilinear relationship
 Correlation coefficient would indicate no relationship
Pearson Correlation Coefficient
 Correlation coefficient
 Numerical expression of:
 Strength and Direction of straight-line relationship
 Varies between –1 and 1
Correlation coefficient
outcomes
-1 is a perfect negative relationship
-.7 is a strong negative relationship
-.4 is a moderate negative relationship
-.1 is a weak negative relationship
0 is no relationship
.1 is a weak positive relationship
.4 is a moderate positive relationship
.7 is a strong positive relationship
1 is a perfect positive relationship
Pearson’s r (correlation
coefficient)
 Used for interval or ratio variables
 Reflects the extent to which cases have similar z-scores on
variables X and Y
 Positive relationship—z-scores have the same sign
 Negative relationship—z-scores have the opposite sign
Positive relationship z-scores
Person
Xz
Yz
A
1.06
1.11
B
.56
.65
C
.03
-.01
D
-.42
-.55
E
-1.23
-1.09
Negative relationship z-scores
Person
Xz
Yz
A
1.06
-1.22
B
.56
-.51
C
.03
-.06
D
-.42
.66
E
-1.23
1.33
Conceptual formula for Pearson’s
r
 Multiply each cases z-score
 Sum the products
 Divide by N
Significance of Pearson’s r
 Pearson’s r tells us the strength and direction
 Significance is determined by converting the r to a t ratio and
looking it up in a t table
 Null: r = .00
 How different is what we observe from null?
 Less than .05?
Computer Printout
Correlations
FAVOR LEGALIZED
ABORTION
SHOULD LIVE
TOGETHER BEFORE
MARRIAGE
PUBLIC HIGH SCH
SHOULD DISTRIBUTE
CONDOM
IN FAVOR OF
LEGALIZING SAME
SEX MARRIAGE
Pears on Correlation
Sig. (1-tailed)
N
Pears on Correlation
Sig. (1-tailed)
PUBLIC HIGH
SHOULD LIVE
SCH
IN FAVOR OF
FAVOR
TOGETHER
SHOULD
LEGALIZING
LEGALIZED
BEFORE
DISTRIBUTE
SAME SEX
ABORTION
MARRIAGE
CONDOM
MARRIAGE
1
.363**
.410**
.399**
.
.000
.000
.000
3323
3295
3294
3297
.363**
1
.461**
.366**
.000
.
.000
.000
N
3295
3318
3303
3291
Pears on Correlation
Sig. (1-tailed)
N
.410**
.000
.461**
.000
3294
3303
3315
3290
.399**
.000
3297
.366**
.000
3291
.428**
.000
3290
1
.
3317
Pears on Correlation
Sig. (1-tailed)
N
**. Correlation is significant at the 0.01 level (1-tailed).
1
.
.428**
.000
Related documents