Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sociology 601: Midterm review, October 15, 2009 • Basic information for the midterm – – – – – Date: Tuesday October 20, 2009 Start time: 2 pm. Place: usual classroom, Art/Sociology 3221 Bring a sheet of notes, a calculator, two pens or pencils Notify me if you anticipate any timing problems • Review for midterm – – – – – – terms symbols steps in a significance test testing differences in groups contingency tables and measures of association equations 1 Important terms from chapter 1 Terms for statistical inference: • • • • population sample parameter statistic Key idea: You use a sample to make inferences about a population 2 Important terms from chapter 2 2.1) Measurement: • • • • • • variable interval scale ordinal scale nominal scale discrete variable continuous variable 2.2-2.4) Sampling: • simple random sample • probability sampling • stratified sampling • cluster sampling • multistage sampling • sampling error Key idea: Statistical inferences depend on measurement and sampling. 3 Important terms from chapter 3 3.1) Tabular and graphic description • • • • frequency distribution relative frequency distribution histogram bar graph 3.2-3.4) Measures of central tendency and variation • • • • • • • • mean median mode proportion standard deviation variance interquartile range quartile, quintile, percentile 4 Important terms from chapter 3 Key ideas: 1.) Statistical inferences are often made about a measure of central tendency. 2.) Measures of variation help us estimate certainty about an inference. 5 Important terms from Chapter 4 • • • • • • • probability distribution sampling distribution sample distribution normal distribution standard error central limit theorem z-score Key ideas: 1.) If we know what the population is like, we can predict what a sample might be like. 2.) A sample statistic gives us a best guess of the population parameter. 2.) If we work carefully, a sample can tell us how confident to be about our sample statistic. 6 Important terms from chapter 5 • • • • • point estimator estimate unbiased efficient confidence interval Key ideas: 1.) We have a standard set of equations we use to make estimates. 2.) These equations are used because they have specific desirable properties. 3.) A confidence interval provides your best guess of a parameter. 4.) A confidence interval provides your best guess of how close your best guess (in part 3.)) will typically be to the parameter. 7 Important terms from chapter 6 6.1 – 6.3) Statistical inference: Significance tests • • • • • • • • • assumptions hypothesis test statistic p-value conclusion null hypothesis one-sided test two-sided test z-statistic 8 Key Idea from chapter 6 A significance test is a ritualized way to ask about a population parameter. 1.) Clearly state assumptions 2.) Hypothesize a value for a population parameter 3.) Calculate a sample statistic. 4.) Estimate how unlikely it is for the hypothesized population to produce such a sample statistic. 5.) Decide whether the hypothesis can be thrown out. 9 More important terms from chapter 6 6.4, 6.7) Decisions and types of errors in hypothesis tests • type I error • type II error • power 6.5-6.6) Small sample tests • t-statistic • binomial distribution • binomial test Key ideas: 1.) Modeling decisions and population characteristics can affect the probability of a mistaken inference. 2.) Small sample tests have the same principles as large sample 10 tests, but require different assumptions and techniques. symbols ˆ Yi Y s2 2 ˆ t H0 0 s z P ˆ n 0 ˆ Y ˆ Y df Ha 11 Significance tests, Step 1: assumptions • An assumption that the sample was drawn at random. – this is pretty much a universal assumption for all significance tests. • An assumption whether the variable has two outcome categories (proportion) or many intervals (mean). • An assumption that enables us to assume a normal sampling distribution. This is assumption varies from test to test. – Some tests assume a normal population distribution. – Other tests assume different minimum sample sizes. – Some tests do not make this assumption. • Declare α level at the start, if you use one. 12 Significance Tests, Step 2: Hypothesis • State the hypothesis as a null hypothesis. – Remember that the null hypothesis is about the population from which you draw your sample. • Write the equation for the null hypothesis. • The null hypothesis can imply a one- or two-sided test. – Be sure the statement and equation are consistent. 13 Significance Tests, Step 3: Test statistic For the test statistic, write: • the equation, • your work, and • the answer. – Full disclosure maximizes partial credit. – I recommend four significant digits at each computational step, but present three as the answer. 14 Significance tests, Step 4: p-value Calculate an appropriate p-value for the test-statistic. – Use the correct table for the type of test; – Use the correct degrees of freedom if applicable; – Use a correct p-value for a one- or two-sided test, as you declared in the hypothesis step. 15 Significance Tests, Step 5: Conclusion Write a conclusion – write the p-value, your decision to reject H0 or not; – a statement of what your decision means; – discuss the substantive importance of your sample statistic. 16 Useful STATA outputs • immediate test for sample mean using TTESTI: . * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500 . ttesti 100 508 100 500, level(95) One-sample t test ----------------------------------------------------------------------------| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+------------------------------------------------------------------x | 100 508 10 100 488.1578 527.8422 ----------------------------------------------------------------------------Degrees of freedom: 99 Ho: mean(x) = 500 Ha: mean < 500 t = 0.8000 P < t = 0.7872 Ha: mean != 500 t = 0.8000 P > |t| = 0.4256 Ha: mean > 500 t = 0.8000 P > t = 200.2128 Useful STATA outputs • immediate test for sample proportion using PRTESTI: • • . * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5 . prtesti 832 .53 .50, level(95) • One-sample test of proportion • • • • • -----------------------------------------------------------------------------Variable | Mean Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------x | .53 .0173032 .4960864 .5639136 ------------------------------------------------------------------------------ • • • • x: Number of obs = 832 Ho: proportion(x) = .5 Ha: x < .5 z = 1.731 P < z = 0.9582 Ha: x != .5 z = 1.731 P > |z| = 0.0835 Ha: x > .5 z = 1.731 P > z = 0.0418 21 Useful STATA outputs • Comparison of two means using ttesti • • ttesti 4252 18.1 12.9 6764 32.6 18.2, unequal • Two-sample t test with unequal variances • • • • • • • • • • • -----------------------------------------------------------------------------| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------x | 4252 18.1 .1978304 12.9 17.71215 18.48785 y | 6764 32.6 .221294 18.2 32.16619 33.03381 ---------+-------------------------------------------------------------------combined | 11016 27.00323 .1697512 17.8166 26.67049 27.33597 ---------+-------------------------------------------------------------------diff | -14.5 .2968297 -15.08184 -13.91816 -----------------------------------------------------------------------------Satterthwaite's degrees of freedom: 10858.6 • • • • Ho: mean(x) - mean(y) = diff = 0 Ha: diff < 0 t = -48.8496 P < t = 0.0000 Ha: diff != 0 t = -48.8496 P > |t| = 0.0000 Ha: diff > 0 t = -48.8496 P > t = 1.0000 24 Chapter 6: Significance Tests for Single Sample or mean proportion mean proportion sample size large large small small best test z-test for Ybar - 0 z-test for hat - 1 t-test for Ybar - 0 Fisher’s exact test 32 Equations for tests of statistical significance Y 0 z ˆY ˆ 0 z ˆ Y 0 t ˆY 33 Chapter 7: Comparing scores for two groups or sample size mean large proportion large mean small proportion small mean large proportion large mean small sample scheme independent independent independent independent dependent dependent dependent best test z-test for 2 - 1 z-test for 2 - 1 t-test for 2 - 1 Fisher’s exact test z-test for D McNemar test t-test for D proportion small dependent Binomial test34 Two Independent Groups: Large Samples, Means • It is important to be able to recognize the parts of the equation, what they mean, and why they are used. • Equal variance assumption? NO 7.1. difference of two large sample means : Y2 Y1 0 z s12 s2 2 n1 n 2 35 Two Independent Groups: Large Samples, Proportions • Equal variance assumption? YES (if proportions are equal then so are variances). • df = N1 + N2 - 2 7.2 difference of 2 large sample proportions : z ˆ 2 ˆ1 0 ˆ (1 ˆ) n1 ˆ (1 ˆ) n2 36 Two Independent Groups: Small Samples, Means 7.3 Difference of two small sample means: (Y2 Y1 ) 0 t(or z) ˆ Y Y 2 1 (Y2 Y1 ) (n1 1)s12 (n 2 1)s22 * n1 n 2 2 1 1 n1 n2 Equal variance assumption: SOMETIMES (for ease) NO (in computer programs) 37 Two Independent Groups: Small Samples, Proportions Fisher’s exact test • via stata, SAS, or SPSS • calculates exact probability of all possible occurences 38 Dependent Samples: D D • Means: t(or z) sD ˆD • Proportions: z n n12 n 21 n12 n 21 39 Chapter 8: Analyzing associations • Contingency tables and their terminologies: – marginal distributions and joint distributions – conditional distribution of R, given a value of E. (as counts or percentages in A & F) – marginal, joint, and conditional probabilities. (as proportions in A & F) • “Are two variables statistically independent?” 40 Descriptive statistics you need to know • How to draw and interpret contingency tables (crosstabs) • Frequency and probability/ percentage terms – marginal – conditional – joint • Measures of relationships: – odds, odds ratios – gamma and tau-b 41 Observed and expected cell counts • fo, the observed cell count, is the number of cases in a given cell. • fe, the expected cell count, is the number of cases we would predict in a cell if the variables were independent of each other. • fe = row total * column total / N – the equation for fe is a correction for rows or columns with small totals. 42 Chi-squared test of independence • Assumptions: 2 categorical variables, random sampling, fe >= 5 • Ho: variables are statistically independent (crudely, the score for one variable is independent of the score for the other.) • Test statistic: 2 = ((fo-fe)2/fe) • p-value from 2 table, df = (r-1)(c-1) • Conclusion; reject or do not reject based on p-value and prior -level, if necessary. Then, describe your conclusion. 43 Probabilities, odds, and odds ratios. • Given a probability, you can calculate an odds and a log odds. – odds = p / (1-p) • 50/50 = 1.0 • 0 ∞ – log odds = log (p / (1-p) ) = log (p) – log(1-p) • 50/50 = 0.0 • -∞ +∞ – odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ] • Given an odds, you can calculate a probability. p = odds / ( 1 + odds) 44 Measures of association with ordinal data • concordant observations C: – in a pair, one is higher on both x and y • discordant observations D: – in a pair, one is higher on x and lower on y • ties – in a pair, same on x or same on y • gamma CD CD (ignores ties) • tau-b is a gamma that adjusts for “ties” – gamma often increases with more collapsed tables – b and both have standard errors in computer output 45 – b can be interpreted as a correlation coefficient