* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 21.statistics - Illinois State University Department of Psychology
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Taylor's law wikipedia , lookup
History of statistics wikipedia , lookup
Statistical inference wikipedia , lookup
Analysis of variance wikipedia , lookup
Omnibus test wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Using Statistics in Research Psych 231: Research Methods in Psychology Announcements I will be helping with statistical analyses of group project data during this week’s labs. – Enter data into SPSS datafile and e-mail it to me – Bring raw data in organized fashion for easy entry into SPSS – Think about what the appropriate statistical test should be IN ADVANCE of seeing me Statistics Why do we use them? – Descriptive statistics • Used to describe, simplify, & organize data sets – Inferential statistics • Used to test claims about the population, based on data gathered from samples • Takes sampling error into account, are the results above and beyond what you’d expect by random chance Distributions Recall that a variable is a characteristic that can take different values. The distribution of a variable is a summary of all the different values of a variable – both type (each value) and token (each instance) Distribution Example: Distribution of scores on an exam – A frequency histogram Frequency 20 18 16 14 12 10 8 6 4 2 0 18 17 12 11 10 8 7 5 3 1 5054 5559 6064 6569 7074 7579 8084 8589 9094 95100 Distribution Properties of a distribution – Shape • Symmetric v. asymmetric (skew) • Unimodal v. multimodal – Center • Where most of the data in the distribution are – Spread (variability) • How similar/dissimilar are the scores in the distribution? Distributions A picture of the distribution is usually helpful – Gives a good sense of the properties of the distribution Many different ways to display distribution – Graphs • Continuous variable: – histogram, line graph (frequency polygons) • Categorical variable: – pie chart, bar chart – Table • Frequency distribution table • Stem and leaf plot Graphs for continuous variables Histogram 20 16 12 8 4 0 50.0 60.0 55.0 EXAM2 70.0 65.0 80.0 75.0 90.0 85.0 100.0 95.0 Line graph Graphs for categorical variables Bar chart Pie chart Cutting Doe Missing Smith Frequency distribution table VAR00 003 Va lid 1.00 Fre quen cy 2 Percent 7.7 Va lid Perce nt 7.7 Cumu lati ve Percent 7.7 2.00 3.00 4.00 3 3 5 11 .5 11 .5 19 .2 11 .5 11 .5 19 .2 19 .2 30 .8 50 .0 5.00 6.00 7.00 8.00 4 2 4 2 15 .4 7.7 15 .4 7.7 15 .4 7.7 15 .4 7.7 65 .4 73 .1 88 .5 96 .2 9.00 To tal 1 26 3.8 10 0.0 3.8 10 0.0 10 0.0 Values (types) Counts Percentages Descriptive statistics In addition to pictures of the distribution, numerical summaries are also presented. Numeric Descriptive Statistics – Shape: • Skew (symmetry) & Kurtosis (flatness) – Measures of Center: • Mean • Median • Mode – Measures of Variability (Spread) • Standard deviation (variance) • Range Shape Symmetric Asymmetric Positive Skew tail Negative Skew tail Shape Unimodal (one mode) Multimodal – Bimodal examples Center There are three main measures of center – Mean (M): the arithmetic average • Add up all of the scores and divide by the total number • Most used measure of center – Median (Mdn): the middle score in terms of location • The score that cuts off the top 50% of the from the bottom 50% • Good for skewed distributions (e.g. net worth) – Mode: the most frequent score • Good for nominal scales (e.g. eye color) • A must for multi-modal distributions Spread (Variability) How similar are the scores? – Range: the maximum value - minimum value • Only takes two scores from the distribution into account • Influenced by extreme values (outliers) – Standard deviation (SD): (essentially) the average amount that the scores in the distribution deviate from the mean • Takes all of the scores into account • Also influenced by extreme values (but not as much as the range) – Variance: standard deviation squared Variability Low variability – The scores are fairly similar mean High variability – The scores are fairly dissimilar mean Relationships between variables Suppose that you notice that the more you study for an exam, the better your score typically is. This suggests that there is a relationship between study time and test performance. Computation of the Correlation Coefficient (and regression) - a numerical description of the relationship between two variables May be used for – – – – Prediction Validity Reliability Theory verification Correlation For relationship between two continuous variables we use Pearson’s r (Pearson product-moment correlation) It basically tells us how much our two variables vary together – As X goes up, what does Y typically do • X, Y • X, Y • X, Y Correlation Properties of a correlation – Form • Linear • Non-linear – Direction • Negative • Positive – Strength • Ranges from -1 to +1, 0 means no relationship Scatterplot Plots one variable against the other Useful for “seeing” the relationship – Form, Direction, and Strength Each point corresponds to a different individual Imagine a line through the data points Scatterplot Y 6 X 6 1 Y 6 2 5 6 3 3 4 2 3 2 1 5 4 1 2 3 4 5 6 X Form Linear Non-linear Direction Negative Positive Y • As X goes up, Y goes up Y X X • As X goes up, Y goes down • X & Y vary in the same direction • X & Y vary in opposite directions • positive r • negative r Strength Zero means “no relationship”. – The farther the r is from zero, the stronger the relationship The strength of the relationship – Spread around the line (note the axis scales) r2 sometimes reported instead – %variance in Y given X Strength r = -1.0 “perfect negative corr.” r2 = 100% -1.0 r = 0.0 “no relationship” r2 = 0.0 0.0 r = 1.0 “perfect positive corr.” r2 = 100% +1.0 The farther from zero, the stronger the relationship Strength Rel A Rel B r = 0.5 r2 = 25% r = -0.8 r2 = 64% -.8 -1.0 .5 0.0 Which relationship is stronger? Rel A, -0.8 is stronger than +0.5 +1.0 Regression Compute the equation for the line that best fits the data points Y 6 5 Y = (X)(slope) + (intercept) 4 3 2 1 0.5 Change in Y 1 2 3 4 5 6 X Change in X 2.0 = slope Regression 4.5 Can make specific predictions about Y based on X Y 6 5 X=5 Y = (X)(.5) + (2.0) Y=? Y = (5)(.5) + (2.0) Y = 2.5 + 2 = 4.5 4 3 2 1 1 2 3 4 5 6 X Regression Also need a measure of error Y = X(.5) + (2.0) + error Y = X(.5) + (2.0) + error • Same line, but different relationships (strength difference) Y 6 5 Y 6 5 4 3 2 1 4 3 2 1 1 2 3 4 5 6 X 1 2 3 4 5 6 X Multiple regression You want to look at how more than one variable may be related to Y The regression equation gets more complex – X, Z, & W variables are used to predict Y – e.g., Y = b1X + b2Z + b3W + b0 + error Cautions with correlation and regression Don’t make causal claims Don’t extrapolate Extreme scores can strongly influence the calculated relationship Inferential Statistics Why? – Purpose: To make claims about populations based on data collected from samples What’s the big deal? – Example Experiment: • • • • Group A - gets treatment to improve memory Group B - control, gets no treatment After treatment period test both groups for memory Results: Group A’s average memory score is 80%, while group B’s is 76% • Is the 4% difference a “real” difference or is it just sampling error? Testing Hypotheses Step 1: State your hypotheses – Null hypothesis (H0) • There are no differences (effects) • This is the hypothesis that you are testing – Alternative hypothesis(ses) • Generally, not all groups are equal • You aren’t out to prove the alternative hypothesis (although it feels like this is what you want to do) • If you reject the null hypothesis, then you’re left with support for the alternative(s) (NOT proof!) Hypotheses In our memory example experiment – H0: mean of Group A = mean of Group B – HA: mean of Group A ≠ mean of Group B • (Or more precisely: Group A > Group B) – It seems like our theory is that the treatment should improve memory. – That’s the alternative hypothesis. That’s NOT the one the we’ll test with inferential statistics. – Instead, we test the H0 Testing Hypotheses Step 2: Set your decision criteria – Your alpha level will be your guide for when to reject or fail to reject the null hypothesis Step 3: Collect your data from your sample(s) Step 4: Compute your test statistics – Descriptive statistics (means, standard deviations, etc.) – Inferential statistics (t-tests, ANOVAs, etc.) Step 5: Make a decision about your null hypothesis – Reject H0 – Fail to reject H0 Statistical significance “Statistically significant difference” – When you reject your null hypothesis – Essentially this means that the observed difference is above what you’d expect by chance – “Chance” is determined by estimating how much sampling error there is – Factors affecting “chance” • Sample size • Population variability Sampling error Population mean Population Distribution x N=1 Sampling error (Pop mean - sample mean) Sampling error Population mean Population Distribution Sample mean x N=2 x Sampling error (Pop mean - sample mean) Sampling error Population mean Population Sample mean Distribution x N = 10 x x x x x x x xx Sampling error (Pop mean - sample mean) Generally, as the sample size increases, the sampling error decreases Sampling error Typically the narrower the population distribution, the narrower the range of possible samples, and the smaller the “chance” Small population variability Large population variability Sampling distribution The sampling distribution is a distribution of all possible sample means of a particular sample size that can be drawn from the population Population Distribution of sample means Samples of size = n XA XB XC XD “chance” Avg. Sampling error Error types Based on the outcomes of the statistical tests researchers will either: – Reject the null hypothesis – Fail to reject the null hypothesis This could be correct conclusion or the incorrect conclusion – Two ways to go wrong • Type I error: saying that there is a difference when there really isn’t one • Type II error: saying that there is not a difference when there really is one Error types Real world (‘truth’) H0 is correct Reject H0 Experimenter’s conclusions Fail to Reject H0 H0 is wrong Type I error Type II error Error types: Courtroom analogy Real world (‘truth’) Defendant is innocent Defendant is guilty Type I error Jury’s decision Find guilty Type II error Find not guilty Error types Type I error: concluding that there is an effect (a difference between groups) when there really isn’t. – – – – Sometimes called “significance level” We try to minimize this (keep it low) Pick a low level of alpha Psychology: 0.05 and 0.01 most common Type II error: concluding that there isn’t an effect, when there really is. – Related to the Statistical Power of a test – How likely are you able to detect a difference if it is there 1 Significance “A statistically significant difference” means: – the researcher is concluding that there is a difference above and beyond chance – with the probability of making a type I error at 5% (assuming an alpha level = 0.05) Note “statistical significance” is not the same thing as theoretical significance. – Only means that there is a statistical difference – Doesn’t mean that it is an important difference Non-Significance Failing to reject the null hypothesis – Generally, not interested in “accepting the null hypothesis” (remember we can’t prove things only disprove them) – Usually check to see if you made a Type II error (failed to detect a difference that is really there) • Check the statistical power of your test – Sample size is too small – Effects that you’re looking for are really small • Check your controls, maybe too much variability Inferential Statistical Tests Different statistical tests – “Generic test” – T-test – Analysis of Variance (ANOVA) “Generic” statistical test Tests the question: – Are there differences between groups due to a treatment? H0: is true (no treatment effect) XA XB H0: is false (is a treatment effect) XA XB “Generic” statistical test XA XB Why might the samples be different? (What is the source of the variability between groups)? – ER: Random sampling error – ID: Individual differences (if between subjects factor) – TR: The effect of a treatment “Generic” statistical test XA XB The generic test statistic Observed difference Difference from chance = TR + ID + ER ID + ER “Generic” statistical test The generic test statistic distribution – To reject the H0, you want a computed test statistics that is large – This large difference, reflects a large Treatment Effect (TR) Distribution of the test statistic Reject H0 Fail to reject H0 1 tailed or 2 tailed 2-tailed tests “look” for any difference 1-tailed tests “look” for a difference in a specific direction (e.g. “an increase”, “an impairment” …) – Statistically more powerful 2-tailed test Reject H0 Fail to reject H0 1-tailed test Reject H0 Fail to reject H0 T-tests Three types – One sample – 2-independent samples – Repeated measures samples T-distribution – Centered on zero, negative and positive values – Degrees of freedom • Based on number of subjects in sample(s) • Tell you what t-distribution to look at Independent samples t-test Design – 2 separate groups of participants (e.g. control and treatment) Degrees of freedom – df = n1 + n2 - 2 Formula: Xtreat - Xcontrol T= Diff by chance Based on variability and size of the samples Independent samples t-test Reporting your results – – – – – The observed difference Kind of t-test Computed T-statistic Degrees of freedom for the test The “p-value” of the test “The mean of the treatment group was 12 points higher than the control group. An independent samples t-test yielded a significant difference, t(25) = 5.67, p < 0.05.” Repeated measures t-test Design – 1 group of participants tested twice (e.g. pre-test and posttest) Degrees of freedom – df = n - 1 (where n = number of difference scores) Formula: Xpost - Xpre T= Diff by chance Difference scores Based on variability and size of the sample of “difference score” Repeated measures t-test Reporting your results – – – – – The observed difference Kind of t-test Computed T-statistic Degrees of freedom for the test The “p-value” of the test “The mean score of the post-test was 12 points higher than the pre-test. A repeated measures t-test demonstrated that this difference was significant significant, t(25) = 5.67, p < 0.05.” Analysis of Variance XA XB XC Designs – More than two groups • 1 Factor ANOVA, Factorial ANOVA • Both Within and Between Groups Factors Test statistic is an F-ratio Degrees of freedom – Several to keep track of – Vary depending on the design Analysis of Variance XA XB XC More than two groups – Now we can’t just compute a simple difference score since there are more than1 difference – So we use variance instead of simply the difference • Variance is essentially an average difference Observed variance F-ratio = Variance from chance 1 factor ANOVA XA XB XC 1 Factor, with more than two levels – Now we can’t just compute a simple difference score since there are more than1 difference • A - B, B - C, & A - C 1 factor ANOVA XA XB XC Null hypothesis: The ANOVA tests this one!! H0: all the groups are equal XA = XB = XC Alternative hypotheses HA: not all the groups are equal XA ≠ XB ≠ XC XA = XB ≠ XC XA ≠ XB = XC XA = XC ≠ XB 1 factor ANOVA Planned contrasts and post-hoc tests: - Further tests used to rule out the different Alternative hypotheses XA ≠ XB ≠ XC Test 1: A ≠ B Test 2: A ≠ C Test 3: B = C XA = XB ≠ XC XA ≠ XB = XC XA = XC ≠ XB 1 factor ANOVA Reporting your results – – – – – – The observed difference Kind of test Computed F-ratio Degrees of freedom for the test The “p-value” of the test Any post-hoc or planned comparison results “The mean score of Group A was 12, Group B was 25, and Group C was 27. A 1-way ANOVA was conducted and the results yielded a significant difference, F(2,25) = 5.67, p < 0.05. Post hoc tests revealed that the differences between groups A and B and A and C were statistically reliable (respectively t(1) = 5.67, p < 0.05 & t(1) = 6.02, p <0.05). Groups B and C did not differ significantly from one another” Factorial ANOVAs We covered much of this in our experimental design lecture More than one factor – Factors may be within or between – Overall design may be entirely within, entirely between, or mixed Many F-ratios may be computed – An F-ratio is computed to test the main effect of each factor – An F-ratio is computed to test each of the potential interactions between the factors Factorial ANOVA Reporting your results – The observed differences • Because there may be a lot of these, may present them in a table instead of directly in the text – Kind of design • e.g. “2 x 2 completely between factorial design” – Computed F-ratios • May see separate paragraphs for each factor, and for interactions – Degrees of freedom for the test • Each F-ratio will have its own set of df’s – The “p-value” of the test • May want to just say “all tests were tested with an alpha level of 0.05) – Any post-hoc or planned comparison results • Typically only the theoretically interesting comparisons are presented