Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bivariate Analyses Bivariate Procedures I Overview Chi-square test T-test Correlation Chi-Square Test Relationships between nominal variables Types: 2x2 chi-square Gender by Political Party 2x3 chi-square Gender by Dosage (Hi vs. Med. Vs. Low) Starting Point: The Crosstab Table Example: Gender (IV) Males Females Democrat 1 20 Republican 10 2 Total 11 22 Party (DV) Column Percentages Gender (IV) Males Females Democrat 9% 91% Republican 91% 9% Total 100% 100% Party (DV) Row Percentages Gender (IV) Males Females Total Democrat 5% 95% 100% Republican 83% 17% 100% Party (DV) Full Crosstab Table Males Democrat Republican Total 1 Females Total 20 21 5% 95% 9% 91% 10 2 12 83% 17% 91% 9% 11 22 33% 64% 36% 33 67% 100% Research Question and Hypothesis Research Question: Is gender related to party affiliation? Hypothesis: Men are more likely than women to be Republicans Null hypothesis: There is no relation between gender and party Testing the Hypothesis Eyeballing the table: Seems to be a relationship Is it significant? Or, could it be just a chance finding? Logic: Is the finding different enough from the null? Chi-square answers this question What factors would it take into account? Factors Taken into Consideration Factors: 1. Magnitude of the difference 2. Sample size Biased coin example Magnitude of difference: 60% heads vs. 99% heads Sample size: 10 flips vs. 100 flips vs. 1 million flips Chi-square Chi-Square starts with the frequencies: Compare observed frequencies with frequencies we expect under the null hypothesis What would the Frequencies be if there was No Relationship? Males Females Democrat 21 Republican 12 Total 11 22 Total 33 Expected Frequencies (Null) Males Females Democrat 7 14 21 Republican 4 8 12 Total 11 22 Total 33 Comparing the Observed and Expected Cell Frequencies Formula: Calculating the Expected Frequency Simple formula for expected cell frequencies Row total x column total / Total N 21 x 11 / 33 = 7 21 x 22 / 33 = 14 12 x 11 / 33 = 4 12 x 22 / 33 = 8 Observed and Expected Cell Frequencies Males Females Democrat 1 7 20 14 21 Republican 10 4 2 8 12 Total 11 22 Total 33 Plugging into the Formula O-E Square Square/E Cell A = 1-7 = -6 36 36/7 = 5.1 Cell B = 20-14 = 6 36 36/14 = 2.6 Cell C = 10-4 = 636 36/4 = 9 Cell D = 2-8 = -6 36 36/8 = 4.5 Sum = 21.2 Chi-square = 21.2 Is the chi-square significant? Significance of the chi-square: Great differences between observed and expected lead to bigger chi-square How big does it have to be for significance? Depends on the “degrees of freedom” Formula for degrees of freedom: (Rows – 1) x (Columns – 1) Chi-square Degrees of Freedom 2 x 2 chi-square = 1 3x3=? 4x3=? Chi-square Critical Values df P = 0.05 P = 0.01 P = 0.001 1 3.84 6.64 10.83 2 5.99 9.21 13.82 3 7.82 11.35 16.27 4 9.49 13.28 18.47 5 11.07 15.09 20.52 6 12.59 16.81 22.46 7 14.07 18.48 24.32 8 15.51 20.09 26.13 9 16.92 21.67 27.88 10 18.31 23.21 29.59 * If chi-square is > than critical value, relationship is significant Chi-Square Computer Printout CONSERV * SEX OF RESPONDENT Crosstabulation CONSERV .00 1.00 Total Count % within CONSERV % within SEX OF RESPONDENT Count % within CONSERV % within SEX OF RESPONDENT Count % within CONSERV % within SEX OF RESPONDENT SEX OF RESPONDENT 1.00 2.00 1274 1583 44.6% 55.4% Total 2857 100.0% 86.4% 84.4% 85.3% 201 40.8% 292 59.2% 493 100.0% 13.6% 15.6% 14.7% 1475 44.0% 1875 56.0% 3350 100.0% 100.0% 100.0% 100.0% Chi-Square Computer Printout Chi-Square Tests Pears on Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Ass ociation N of Valid Cas es Value 2.492b 2.339 2.504 2.491 df 1 1 1 1 Asymp. Sig. (2-s ided) .114 .126 .114 Exact Sig. (2-s ided) Exact Sig. (1-s ided) .116 .063 .115 3350 a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count les s than 5. The minimum expected count is 217.07. Multiple Chi-square Exact same procedure as 2 variable X2 Used for more than 2 variables E.g., 2 x 2 x 2 X2 Gender x Hair color x eye color Multiple chi-square example Agree with "police should use any force necessary" * Believe in god * SEX OF RESPONDENT Crosstabulation SEX OF RESPONDENT 1.00 Agree with "police s hould use any force necess ary" .00 1.00 Total 2.00 Agree with "police s hould use any force necess ary" .00 1.00 Total Count % within Agree with "police s hould us e any force necess ary" % within Believe in god Count % within Agree with "police s hould us e any force necess ary" % within Believe in god Count % within Agree with "police s hould us e any force necess ary" % within Believe in god Count % within Agree with "police s hould us e any force necess ary" % within Believe in god Count % within Agree with "police s hould us e any force necess ary" % within Believe in god Count % within Agree with "police s hould us e any force necess ary" % within Believe in god Believe in god .00 1.00 76 548 Total 624 12.2% 87.8% 100.0% 53.5% 66 42.2% 750 43.3% 816 8.1% 91.9% 100.0% 46.5% 142 57.8% 1298 56.7% 1440 9.9% 90.1% 100.0% 100.0% 35 100.0% 877 100.0% 912 3.8% 96.2% 100.0% 48.6% 37 49.4% 899 49.4% 936 4.0% 96.0% 100.0% 51.4% 72 50.6% 1776 50.6% 1848 3.9% 96.1% 100.0% 100.0% 100.0% 100.0% Multiple chi-square example Chi-Square Tests SEX OF RESPONDENT 1.00 2.00 Pears on Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Tes t Linear-by-Linear Ass ociation N of Valid Cas es Pears on Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Tes t Linear-by-Linear Ass ociation N of Valid Cas es Value 6.659b 6.206 6.593 df 1 1 1 Asymp. Sig. (2-s ided) .010 .013 .010 6.654 1 .010 1440 .016c .000 .016 1 1 1 .898 .994 .898 .016 1 Exact Sig. (2-s ided) Exact Sig. (1-s ided) .012 .007 .905 .497 .898 1848 a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 61.53. c. 0 cells (.0%) have expected count less than 5. The minimum expected count is 35.53. The T-test Groups T-test Comparing the means of two nominal groups E.g., Gender and IQ E.g., Experimental vs. Control group Pairs T-test Comparing the means of two variables Comparing the mean of a variable at two points in time Logic of the T-test A T-test considers three things: 1. The group means 2. The dispersion of individual scores around the mean for each group (sd) 3. The size of the groups Difference in the Means The farther apart the means are: The more confident we are that the two group means are different Distance between the means goes in the numerator of the t-test formula Why Dispersion Matters Small variances Large variances Size of the Groups Larger groups mean that we are more confident in the group means IQ example: Women: mean = 103 Men: mean = 97 If our sample was 5 men and 5 women, we are not that confident If our sample was 5 million men and 5 million women, we are much more confident The four t-test formulae 1. Matched samples with unequal variances 2. Matched samples with equal variances 3. Independent samples with unequal variances 4. Independent samples with equal variances All four formulae have the same Numerator X1 - X2 (group one mean - group two mean) What differentiates the four formulae is their denominator denominator is “standard error of the difference of the means” each formula has a different standard error Independent sample with unequal variances formula Standard error formula (denominator): T-test Value Look up the T-value in a T-table (use absolute value ) First determine the degrees of freedom ex. df = (N1 - 1) + (N2 - 1) 40 + 30 = 70 For 70 df at the .05 level =1.67 ex. 5.91 > 1.67: Reject the null (means are different) Groups t-test printout example Group Statistics MEN ARE BETTER LEADERS THAN WOMEN SEX OF RESPONDENT 1.00 2.00 N 1461 1856 Mean 3.1485 2.1999 Std. Deviation 1.50678 1.34842 Std. Error Mean .03942 .03130 Inde pende nt S ample s Test Leven e's T est for E quality o f Va riances F ME N ARE B ET T ER E qual vari ances LE ADERS THAN WOM EN assum ed E qual vari ances not assumed 27.8 70 S ig. .000 t-test for Equ ality of M eans t df Mea n S ig. (2-t ailed) Differen ce S td. E rror Differen ce 95% Co nfiden ce Inte rval of th e Differen ce Lower Upper 19.0 96 331 5 .000 .948 6 .049 68 .851 24 1.04 604 18.8 46 295 6.316 .000 .948 6 .050 34 .849 94 1.04 733 Pairs t-test example Paired Samples Statistics Mean Pair 1 IN FAVOR OF LEGALIZING SAME SEX MARRIAGE IN FAVOR OF DEATH PENALTY N Std. Error Mean Std. Deviation 2.3909 3305 1.74692 .03039 4.4617 3305 1.67497 .02914 Paired Sam ples Tes t Paired Dif f erences Mean Pair 1 IN FA VOR OF LEGALIZING SA ME SEX MA RRIAGE - IN FAVOR OF DEA TH PENALTY -2.0708 Std. Deviation 2.39422 Std. Error Mean .04165 95% Conf idence Interval of the Diff erence Low er Upper -2.1525 -1.9891 t -49.723 df 3304 Sig. (2-tailed) .000 Pearson Correlation Coefficient (r ) Characteristics of correlational relationships: 1. Strength 2. Significance 3. Directionality 4. Curvilinearity Strength of Correlation: Strong, weak and non-relationships Nature of such relations can be observed in scatter diagrams Scatter diagram One variable on x axis and the other on the y-axis of a graph Plot each case according to its x and y values Scatterplot: Strong relationship B O O K R E A D I N G Years of Education Scatterplot: Weak relationship I N C O M E Years of Education Scatterplot: No relationship S P O R T S I N T E R E S T Years of Education Strength increases… As the points more closely conform to a straight line Drawing the best fitting line between the points: “the regression line” Minimizes the distance of the points from the line: “least squares” Minimizing the deviations from the line Significance of the relationship Whether we are confident that an observed relationship is “real” or due to chance What is the likelihood of getting results like this if the null hypothesis were true? Compare observed results to expected under the null If less than 5% chance, reject the null hypothesis Directionality of the relationship Correlational relationship can be positive or negative Positive relationship High scores on variable X are associated with high scores on variable Y Negative relationship High scores on variable X are associated with low scores on variable Y Positive relationship example B O O K R E A D I N G Years of Education Negative relationship example R A C I A L P R E J U D I C E Years of Education Curvilinear relationships Positive and negative relationships are “straight-line” or “linear” relationships Relationships can also be strong and curvilinear too Points conform to a curved line Curvilinear relationship example F A M I L Y S I Z E SES Curvilinear relationships Linear statistics (e.g. correlation coefficient, regression) can mask a significant curvilinear relationship Correlation coefficient would indicate no relationship Pearson Correlation Coefficient Correlation coefficient Numerical expression of: Strength and Direction of straight-line relationship Varies between –1 and 1 Correlation coefficient outcomes -1 is a perfect negative relationship -.7 is a strong negative relationship -.4 is a moderate negative relationship -.1 is a weak negative relationship 0 is no relationship .1 is a weak positive relationship .4 is a moderate positive relationship .7 is a strong positive relationship 1 is a perfect positive relationship Pearson’s r (correlation coefficient) Used for interval or ratio variables Reflects the extent to which cases have similar z-scores on variables X and Y Positive relationship—z-scores have the same sign Negative relationship—z-scores have the opposite sign Positive relationship z-scores Person Xz Yz A 1.06 1.11 B .56 .65 C .03 -.01 D -.42 -.55 E -1.23 -1.09 Negative relationship z-scores Person Xz Yz A 1.06 -1.22 B .56 -.51 C .03 -.06 D -.42 .66 E -1.23 1.33 Conceptual formula for Pearson’s r Multiply each cases z-score Sum the products Divide by N Significance of Pearson’s r Pearson’s r tells us the strength and direction Significance is determined by converting the r to a t ratio and looking it up in a t table Null: r = .00 How different is what we observe from null? Less than .05? Computer Printout Correlations FAVOR LEGALIZED ABORTION SHOULD LIVE TOGETHER BEFORE MARRIAGE PUBLIC HIGH SCH SHOULD DISTRIBUTE CONDOM IN FAVOR OF LEGALIZING SAME SEX MARRIAGE Pears on Correlation Sig. (1-tailed) N Pears on Correlation Sig. (1-tailed) PUBLIC HIGH SHOULD LIVE SCH IN FAVOR OF FAVOR TOGETHER SHOULD LEGALIZING LEGALIZED BEFORE DISTRIBUTE SAME SEX ABORTION MARRIAGE CONDOM MARRIAGE 1 .363** .410** .399** . .000 .000 .000 3323 3295 3294 3297 .363** 1 .461** .366** .000 . .000 .000 N 3295 3318 3303 3291 Pears on Correlation Sig. (1-tailed) N .410** .000 .461** .000 3294 3303 3315 3290 .399** .000 3297 .366** .000 3291 .428** .000 3290 1 . 3317 Pears on Correlation Sig. (1-tailed) N **. Correlation is significant at the 0.01 level (1-tailed). 1 . .428** .000