Download Review for Midterm

Sociology 601: Midterm review, October 15, 2009 • Basic information for the midterm – – – – – Date: Tuesday October 20, 2009 Start time: 2 pm. Place: usual classroom, Art/Sociology 3221 Bring a sheet of notes, a calculator, two pens or pencils Notify me if you anticipate any timing problems • Review for midterm – – – – – – terms symbols steps in a significance test testing differences in groups contingency tables and measures of association equations 1 Important terms from chapter 1 Terms for statistical inference: • • • • population sample parameter statistic Key idea: You use a sample to make inferences about a population 2 Important terms from chapter 2 2.1) Measurement: • • • • • • variable interval scale ordinal scale nominal scale discrete variable continuous variable 2.2-2.4) Sampling: • simple random sample • probability sampling • stratified sampling • cluster sampling • multistage sampling • sampling error Key idea: Statistical inferences depend on measurement and sampling. 3 Important terms from chapter 3 3.1) Tabular and graphic description • • • • frequency distribution relative frequency distribution histogram bar graph 3.2-3.4) Measures of central tendency and variation • • • • • • • • mean median mode proportion standard deviation variance interquartile range quartile, quintile, percentile 4 Important terms from chapter 3 Key ideas: 1.) Statistical inferences are often made about a measure of central tendency. 2.) Measures of variation help us estimate certainty about an inference. 5 Important terms from Chapter 4 • • • • • • • probability distribution sampling distribution sample distribution normal distribution standard error central limit theorem z-score Key ideas: 1.) If we know what the population is like, we can predict what a sample might be like. 2.) A sample statistic gives us a best guess of the population parameter. 2.) If we work carefully, a sample can tell us how confident to be about our sample statistic. 6 Important terms from chapter 5 • • • • • point estimator estimate unbiased efficient confidence interval Key ideas: 1.) We have a standard set of equations we use to make estimates. 2.) These equations are used because they have specific desirable properties. 3.) A confidence interval provides your best guess of a parameter. 4.) A confidence interval provides your best guess of how close your best guess (in part 3.)) will typically be to the parameter. 7 Important terms from chapter 6 6.1 – 6.3) Statistical inference: Significance tests • • • • • • • • • assumptions hypothesis test statistic p-value conclusion null hypothesis one-sided test two-sided test z-statistic 8 Key Idea from chapter 6 A significance test is a ritualized way to ask about a population parameter. 1.) Clearly state assumptions 2.) Hypothesize a value for a population parameter 3.) Calculate a sample statistic. 4.) Estimate how unlikely it is for the hypothesized population to produce such a sample statistic. 5.) Decide whether the hypothesis can be thrown out. 9 More important terms from chapter 6 6.4, 6.7) Decisions and types of errors in hypothesis tests • type I error • type II error • power 6.5-6.6) Small sample tests • t-statistic • binomial distribution • binomial test Key ideas: 1.) Modeling decisions and population characteristics can affect the probability of a mistaken inference. 2.) Small sample tests have the same principles as large sample 10 tests, but require different assumptions and techniques. symbols ˆ Yi Y s2 2  ˆ t H0 0 s z  P  ˆ n 0 ˆ Y  ˆ Y df Ha 11 Significance tests, Step 1: assumptions • An assumption that the sample was drawn at random. – this is pretty much a universal assumption for all significance tests. • An assumption whether the variable has two outcome categories (proportion) or many intervals (mean). • An assumption that enables us to assume a normal sampling distribution. This is assumption varies from test to test. – Some tests assume a normal population distribution. – Other tests assume different minimum sample sizes. – Some tests do not make this assumption. • Declare α level at the start, if you use one. 12 Significance Tests, Step 2: Hypothesis • State the hypothesis as a null hypothesis. – Remember that the null hypothesis is about the population from which you draw your sample. • Write the equation for the null hypothesis. • The null hypothesis can imply a one- or two-sided test. – Be sure the statement and equation are consistent. 13 Significance Tests, Step 3: Test statistic For the test statistic, write: • the equation, • your work, and • the answer. – Full disclosure maximizes partial credit. – I recommend four significant digits at each computational step, but present three as the answer. 14 Significance tests, Step 4: p-value Calculate an appropriate p-value for the test-statistic. – Use the correct table for the type of test; – Use the correct degrees of freedom if applicable; – Use a correct p-value for a one- or two-sided test, as you declared in the hypothesis step. 15 Significance Tests, Step 5: Conclusion Write a conclusion – write the p-value, your decision to reject H0 or not; – a statement of what your decision means; – discuss the substantive importance of your sample statistic. 16 Useful STATA outputs • immediate test for sample mean using TTESTI: . * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500 . ttesti 100 508 100 500, level(95) One-sample t test ----------------------------------------------------------------------------| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+------------------------------------------------------------------x | 100 508 10 100 488.1578 527.8422 ----------------------------------------------------------------------------Degrees of freedom: 99 Ho: mean(x) = 500 Ha: mean < 500 t = 0.8000 P < t = 0.7872 Ha: mean != 500 t = 0.8000 P > |t| = 0.4256 Ha: mean > 500 t = 0.8000 P > t = 200.2128 Useful STATA outputs • immediate test for sample proportion using PRTESTI: • • . * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5 . prtesti 832 .53 .50, level(95) • One-sample test of proportion • • • • • -----------------------------------------------------------------------------Variable | Mean Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------x | .53 .0173032 .4960864 .5639136 ------------------------------------------------------------------------------ • • • • x: Number of obs = 832 Ho: proportion(x) = .5 Ha: x < .5 z = 1.731 P < z = 0.9582 Ha: x != .5 z = 1.731 P > |z| = 0.0835 Ha: x > .5 z = 1.731 P > z = 0.0418 21 Useful STATA outputs • Comparison of two means using ttesti • • ttesti 4252 18.1 12.9 6764 32.6 18.2, unequal • Two-sample t test with unequal variances • • • • • • • • • • • -----------------------------------------------------------------------------| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------x | 4252 18.1 .1978304 12.9 17.71215 18.48785 y | 6764 32.6 .221294 18.2 32.16619 33.03381 ---------+-------------------------------------------------------------------combined | 11016 27.00323 .1697512 17.8166 26.67049 27.33597 ---------+-------------------------------------------------------------------diff | -14.5 .2968297 -15.08184 -13.91816 -----------------------------------------------------------------------------Satterthwaite's degrees of freedom: 10858.6 • • • • Ho: mean(x) - mean(y) = diff = 0 Ha: diff < 0 t = -48.8496 P < t = 0.0000 Ha: diff != 0 t = -48.8496 P > |t| = 0.0000 Ha: diff > 0 t = -48.8496 P > t = 1.0000 24 Chapter 6: Significance Tests for Single Sample  or  mean proportion mean proportion sample size large large small small best test z-test for Ybar - 0 z-test for hat - 1 t-test for Ybar - 0 Fisher’s exact test 32 Equations for tests of statistical significance Y  0 z ˆY    ˆ  0  z  ˆ Y  0 t ˆY  33 Chapter 7: Comparing scores for two groups  or  sample size mean large proportion large mean small proportion small mean large proportion large mean small sample scheme independent independent independent independent dependent dependent dependent best test z-test for 2 - 1 z-test for 2 - 1 t-test for 2 - 1 Fisher’s exact test z-test for D McNemar test t-test for D proportion small dependent Binomial test34 Two Independent Groups: Large Samples, Means • It is important to be able to recognize the parts of the equation, what they mean, and why they are used. • Equal variance assumption? NO 7.1. difference of two large sample means : Y2  Y1  0  z s12 s2 2  n1 n 2 35 Two Independent Groups: Large Samples, Proportions • Equal variance assumption? YES (if proportions are equal then so are variances). • df = N1 + N2 - 2 7.2 difference of 2 large sample proportions : z ˆ 2   ˆ1  0  ˆ (1  ˆ)  n1  ˆ (1  ˆ)  n2 36 Two Independent Groups: Small Samples, Means 7.3 Difference of two small sample means: (Y2  Y1 )  0 t(or z)  ˆ Y Y  2  1 (Y2  Y1 ) (n1 1)s12  (n 2 1)s22 * n1  n 2  2 1 1  n1 n2 Equal variance assumption: SOMETIMES (for ease) NO (in computer programs) 37 Two Independent Groups: Small Samples, Proportions Fisher’s exact test • via stata, SAS, or SPSS • calculates exact probability of all possible occurences 38 Dependent Samples: D D  • Means: t(or z)  sD ˆD   • Proportions: z n n12  n 21 n12  n 21 39 Chapter 8: Analyzing associations • Contingency tables and their terminologies: – marginal distributions and joint distributions – conditional distribution of R, given a value of E. (as counts or percentages in A & F) – marginal, joint, and conditional probabilities. (as proportions in A & F) • “Are two variables statistically independent?” 40 Descriptive statistics you need to know • How to draw and interpret contingency tables (crosstabs) • Frequency and probability/ percentage terms – marginal – conditional – joint • Measures of relationships: – odds, odds ratios – gamma and tau-b 41 Observed and expected cell counts • fo, the observed cell count, is the number of cases in a given cell. • fe, the expected cell count, is the number of cases we would predict in a cell if the variables were independent of each other. • fe = row total * column total / N – the equation for fe is a correction for rows or columns with small totals. 42 Chi-squared test of independence • Assumptions: 2 categorical variables, random sampling, fe >= 5 • Ho: variables are statistically independent (crudely, the score for one variable is independent of the score for the other.) • Test statistic: 2 = ((fo-fe)2/fe) • p-value from 2 table, df = (r-1)(c-1) • Conclusion; reject or do not reject based on p-value and prior -level, if necessary. Then, describe your conclusion. 43 Probabilities, odds, and odds ratios. • Given a probability, you can calculate an odds and a log odds. – odds = p / (1-p) • 50/50 = 1.0 • 0 ∞ – log odds = log (p / (1-p) ) = log (p) – log(1-p) • 50/50 = 0.0 • -∞  +∞ – odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ] • Given an odds, you can calculate a probability. p = odds / ( 1 + odds) 44 Measures of association with ordinal data • concordant observations C: – in a pair, one is higher on both x and y • discordant observations D: – in a pair, one is higher on x and lower on y • ties – in a pair, same on x or same on y • gamma CD  CD (ignores ties) • tau-b is a gamma that adjusts for “ties” – gamma often increases with more collapsed tables – b and  both have standard errors in computer output  45 – b can be interpreted as a correlation coefficient

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Review for Midterm