Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
class 7: 10/21/13 intro to statistical methods cont. • Being wrong in science is fine, and even necessary—as long as scientists recognize that they blew it, report their mistake openly instead of disguising it as a success, and then move on to the next thing—until they come up with the very occasional breakthrough. But as long as careers remain contingent on producing a stream of research that’s dressed up to seem more right than it is, scientists will keep delivering exactly that. Science is a noble endeavor, but it is also a low-yield endeavor. I’m not sure that more than a very small percentage of medical research is ever likely to lead to major improvements in clinical outcomes and quality of life. We should be very comfortable with that fact. (p. 86) Friedman, David H. (2010, November). Lies, damned lies, and medical science. The Atlantic, 306(4), 76-86 all researchers must learn the trick and avoid the mistake • trick: begin with the question and then to figure out the best way to answer that question • mistake: begin with the method and fit the question to the method more on models • models should meet three criteria: – generality, precision, accuracy • models can usually satisfy any two, at the cost of sacrificing the third. – climatology settles for generality & accuracy – ecologists focusing particular species, for precision & accuracy – rigorous history & ethnography often give up generality for precision & accuracy—results can still be important • Kitcher, Philip. (2012, May 24). The trouble with scientism: Why history and the humanities are also a form of knowledge. The New Republic, 243, 20-25) • research using – measurement description – statistical analysis critical for answering certain kinds of important questions strengths of measurement description • precise descriptions • often efficient—one can make confident predictions based on relatively small samples—if samples good • increasingly sophisticated ways of analyzing measurement data • powerful stat packages now available for desktop computers, e.g, Systat, SPSS, SAS cautions • measure only what can be measured – “to replace the unmeasureable with the unmeaningful is not progress” (Achen, 1977, p. 806) • value precision but realize a precise description may not be an accurate one • scientific method (drawing inferences from observations) comprises many research methods— strength not from any one specific method my personal recommendations • whatever your Ph.D. Research Specialization take at least one stat course, preferably 2 or 3 • whatever your methodological expertise, find people with similar interests but different methodological expertise and work with them— the best research often uses many approaches • the statistician knows, for example, that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world. (Box, p. 792) • all models are false, but some are useful. Box, George E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791-799. further caution • Statistics today is in a conceptual and theoretical mess. The discipline is divided into two rival camps, the frequentists and the Bayesians, and neither camp offers the tools that science needs for objectively representing and interpreting statistical data as evidence. (pp. 127-128) Royall, Richard. (2004). The likelihood paradigm for statistical evidence. In M. L. Taper & S. R. Lele (Eds.), The nature of scientific evidence: Statistical, philosophical, and empirical considerations (pp. 199-152). Chicago, IL: University of Chicago. K ch 19: inferential statistics • inferential statistics allow one to infer the characteristics of a population from a representative sample – estimate characteristics of population within a determined range with a given probability – determine (in general) with a given probability whether effect beyond sampling and chance error exists • parameters: refer to population • statistics: refer to sample • sampling distribution: descriptive statistic calculated from repeated sampling • confidence intervals: range that includes the population value with a given probability confidence level: • the probability that the interval will contain the population value: conventionally 68%, 95%, and 99% (2 to 1, 19 to 1, 99 to 1 respectively) • the wider the interval the more likely it contains the population value (and the less valuable the information) • hypothesis testing (traditionally takes form of rejecting the null hypothesis, i.e., that there is no effect beyond sampling and chance error) • alpha level: the risk the result is due to chance; set by the researcher in advance, traditionally .10, .05, .01, .001 (N.B., no good reason for these and not others) • p-level: the actual probability level found, which is then compared to the alpha level two-tailed test: • non-directional, puts the alpha level at both ends. used when one does not expect results in one direction one-tailed test: • directional, puts alpha level at one end (determined by researcher). increases probability of finding statistically significant result common statistical tests t test of difference between means • common, simple test for differences between means of two groups chi-square • common test for categorical data and frequencies – are cell values different from what would be expected chi-square examples Jefferson, Madison Combined: Years in Kindergarten by SES 2-year 1-year total poor* 24 6 30 not poor 34 65 99 total 58 71 129 chi-square: 19.4 (1 df) p < .0001 * eligible for free or reduced lunch Jefferson, Madison Combined: Years in Kindergarten by race non-white 2-year 10 1-year 9 total 19 white 48 62 110 total 58 71 129 chi-square: .530 (1 df) p < .5 ANOVA (analysis of variance) • experimental designs where two or more groups or multiple conditions are being compared (common in psychology and ed psych, and in educational research in general) • powerful: – accurate measure of error variance – tests significance of each variable as well as combined effect, – avoids inflation of probabilities problem (not in K) regression analysis • explains (predicts) variability of a dependent variable using information about one or more independent variables. • predicts expected change in dependent variable given specific changes in the independent variable • not used in educational research as much as ANOVA, but more useful for policy purposes regression example achievement*= 77.5 - .80 SES** *combined math & reading scores, ITBS ** percent of low income students errors of inference • type I error (alpha error): a concern when theory testing (K, “when validating a finding”) • type II error (beta error): a concern when theory building (K: “when exploring”) • decreasing the probability of one type increases the probability of the other • pointless to talk about Type I or II error absent discussion of what is at stake cost of type I error in theory testing • dominant theory not challenged • knowledge production stopped cost of type II error in theory building • possibly important explanations etc. ignored • knowledge production stopped one of the many challenges the late and great Lee Cronbach (1916-2001) made to the accepted wisdom of the day statistical power: 1-beta • increasing statistical power: – increase size of effect (stronger treatment) – increase sample size – reduce variability statistical & practical significance • statistical: confidence at a given probability that result is not due to chance • practical: is the result important enough, big enough, feasible, affordable—all value judgments – if one apple a day keeps the doctor away, but it takes three grapefruit, then…? • no statistic or statistical test can make a practical decision • whether one risks being wrong cautiously (Type I) or wrong incautiously (Type II) cannot be decided absent cost and risk, needs, what’s at stake etc. • no statistical analysis better than numbers (descriptions) fed into it: garbage in, garbage out statistical significance refers only to samples from population • it does not refer to size of effect—ceteris paribus larger effects are more likely to be statistically significant, but with large samples very small effects will be • if you have the population, any effects are real, no matter the size no proof in science: • a statistically significant result (assuming appropriate analysis etc.) does not prove that the hypothesis is true, only that it has escaped disconfirmation • the more often an hypothesis passes the test and the more demanding the tests it passes, the more certain we can be that we know something—the more we have reduced uncertainty other terms • parametric: assumes random sampling, from distribution with known parameters, often normal distribution • nonparametric: when data do not come from known distribution—often with nominal or ordinal data • robust test: accurate even when assumptions violated • effect size: too long and too often ignored— journals now requiring estimates of effect size thinking simple statistical way to find out what people may not willing to admit • ask people to flip coin – if head, answer “head: no answer” – if tail and have done X, answer “head: no answer” – if tail and have not done X, answer “no” • thus, no’s an estimate of half who have not done x • thus, N minus twice the number of “no’s” gives estimate of those who have done X Vogt • • • • • • • regression toward the mean reliability self-selection bias sleeper effect sociogram spurious relation (or correlation) suppressor variable Sieber & Tolich: 8: Degrees of nondisclosure 1. double-blind study (neither research nor subject knows) 2. researcher knows; subject does not 3. researcher knows; subject told she will not know everything at the beginning, study has been judged ethical, will be debriefed at the end. • Hawthorne effect: short-lived increase in performance due to an intervention. • Pygmalion effect: expectation of the researcher produces expected effects. (Rosenthal & Jacobson [1986] strongly criticized for methods) • I find the discussion on the second degree (pp. 143-147 hard to follow). • third degree: list on pp. 147-148 • 4 defensible justifications for deception: – – – – data unobtainable if subjects knew real purpose to achieve stimulus control or random assignment to study responses to low frequency events to avoid serious risk • dehoaxing: explaining procedure, carefully • desensitizing: returning subject to frame of mind at least as positive and constructive as when subject entered study Lit review • review section – review lit, follow explicit and logical scheme. – 3-5 sections, with subsections, if useful – end sections with a discussion • discussion section – synthesize the review (discussion of discussions) • conclusion section (< 1 p) – address original question(s) • personal reflections section (1 p) – discuss briefly what you learned in the process of doing the lit review • references – make sure all citations in references – make sure all references cited APA • use first person to talk about yourself, not third, e.g., “The researcher . . .” • use we (us, our, etc.) only to refer to you and your co-authors (69-70) • do not italicize Latin abbreviations—e.g., et al., etc. and so on (only use within parens) • for seriation see 63-65 • single quotation marks only within double • periods & commas always inside quotation marks • italicize new, technical, or key terms or labels the first time, e.g., “The term peer response . . .” (104-106) • do not separate compound verbs with comma: “She walked down the block past her house and then turned into the driveway.” • avoid beginning sentences with “however” • avoid “throat-clearings” to begin sentences, e.g, furthermore, therefore, also, additionally colon (80-81) • between a grammatically complete intro clause and a final clause that illustrates, extends, or amplifies the first. If second clause a complete sentence, capitalize. – Kelly presented two findings: Teachers preferred . . . • do not use a colon after intro that is not a complete sentence – The students were Ben, Akiko, Mustafa. . . . this week free and cheap • under construction