* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides for Session #20
Sufficient statistic wikipedia , lookup
Psychometrics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Statistical inference wikipedia , lookup
Foundations of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Statistics for Social and Behavioral Sciences Part IV: Causality Comparison of two groups Chapter 7 Prof. Amine Ouazad Statistics Course Outline PART I. INTRODUCTION AND RESEARCH DESIGN Week 1 Four Steps of “Thinking Like a Statistician” Study Design: Simple Random Sampling, Cluster Sampling, Stratified Sampling Biases: Nonresponse bias, Response bias, Sampling bias PART II. DESCRIBING DATA Weeks 2-4 Sample statistics: Mean, Median, SD, Variance, Percentiles, IQR, Empirical Rule Bivariate sample statistics: Correlation, Slope PART III. DRAWING CONCLUSIONS FROM DATA: INFERENTIAL STATISTICS Weeks 5-9 Estimating a parameter using sample statistics. Confidence Interval at 90%, 95%, 99% Testing a hypothesis using the CI method and the t method. Weeks 10-14 PART IV. : CORRELATION AND CAUSATION: TWO GROUPS, REGRESSION ANALYSIS This is where we talk about Zmapp and Ebola! Coming up • “Comparison of Two Groups” This Session. • “Univariate Regression Analysis” Next Session Saturday. • “Association and Causality” Tuesday, Thursday and Extra Session. • “Randomized Experiments (Cted), ANOVA”. Last Tuesday and Extra Session. • “Robustness Checks and Wrap Up”. Last Thursday. Outline 1. Randomized controlled trials 2. t test for equality of means Next time: Inference in Univariate Regressions Do U.S. Employers Discriminate? • Employers post job ads. • Sometimes mentioning they are an “Equal Opportunity Employer.” • Some Employers are federal contractors. • Lots of anecdotal evidence… – “In hiring, racial bias is still a problem.” Forbes. – “Protesters allege hiring discrimination by Ferrara Candy”, Chicago Tribune, October 28, 2014. • But we can’t trust stories… • Very very tough question. Should be extra careful. • What about causal evidence from statistical data? Outline 1. Randomized controlled trials 2. t test for equality of means Next time: Inference in Univariate Regressions Difference of means • Two groups: White and African American. – m1: sample mean in first group. X1i: observation of individual i in group 1. – m2: sample mean in the second group. X2i: observation of individual i in group 2. • The expected value of the difference m1-m2? • Sampling distribution of the difference m1-m2? • Standard error of m1-m2: Standard deviation of the sampling distribution of m1-m2. Confidence Interval for the Difference of Means • Very very similar to the one group. • t is also chosen from Table 5.1. • Degrees of freedom df given either by the Welch approximation or the Satterthwaite approximation (see end of handout). In general: • use of t distribution makes normality assumption on X. • Robustness to violations of the normality assumptions, esp. for proportions. t statistic for the difference of means • Built similarly as in the one group case. • Can also subtract numerator by v when testing for the equality of the mean to a number v. Methods for hypothesis testing H0 : m1 = m2 Ha : m1 different from m2. Reject the null hypothesis if either: The confidence interval does not include 0 The t statistic is above the t score in absolute value Application • Compute the t statistics of the difference for each city. • Can you recover the p values using Stata? • Can you reject the null hypothesis that the call back rates for White and African American names are equal? Fill in the t statistics here. Can we reject the null hypothesis? t statistic Can you recover the p values? • Using display 2*ttail in Stata. • And df approximately n1+n2 – 2. Reject? Resume Quality And Callback Rates A refinement for sample proportions • When X1i and X2i are variables that take only two values 0 or 1. • m1 and m2 are sample proportions p1 and p2. • Group 1 size : n1. Group 2 size: n2. • H0: “p1 = p2” • Under the null, the standard deviations of the two samples are equal (p), and thus: • df = n1 + n2 – 2 and Back to Café Firenze? • What we did before: – Confidence interval around the mean m1 of Café Firenze and the mean m2 of Lebanese Express. – We showed that: • Exercise at home: – Can we reject the null hypothesis that the true mean rating m1 of Café Firenze is different from the true mean rating m2 of Lebanese Express? Another application of t tests for the equality of means • “In many studies, one group of volunteers will be given an experimental or "test" drug or treatment, while the control group is given either a standard treatment for the illness or an inactive pill, liquid, or powder that has no treatment value (placebo). This control group provides a basis for comparison for assessing effects of the test treatment. In some studies, the control group will receive a placebo instead of an active drug or treatment. In other cases, it is considered unethical to use placebos, particularly if an effective treatment is available. Withholding treatment (even for a short time) would subject research participants to unreasonable risks.” Coming up: • • • • • Reading : Chapter on “Comparing Two Groups”. Next chapter 9 with t tests for slope coefficients. Online quiz this weekend on this material. Session on Saturday at 12.45 in the same room -> catch up for National Day. Make sure you come to sessions and recitations. For help: • Amine Ouazad Office 1135, Social Science building [email protected] Office hour: Tuesday from 5 to 6.30pm. • GAF: Irene Paneda [email protected] Sunday recitations. At the Academic Resource Center, Monday from 2 to 4pm. Read only if interested: Degrees of freedom for two groups x and y • Satterthwaite’s approximate formula: • Welch’s approximate formula: