Download 21.statistics - Illinois State University Department of Psychology

Using Statistics in Research Psych 231: Research Methods in Psychology Announcements  I will be helping with statistical analyses of group project data during this week’s labs. – Enter data into SPSS datafile and e-mail it to me – Bring raw data in organized fashion for easy entry into SPSS – Think about what the appropriate statistical test should be IN ADVANCE of seeing me Statistics  Why do we use them? – Descriptive statistics • Used to describe, simplify, & organize data sets – Inferential statistics • Used to test claims about the population, based on data gathered from samples • Takes sampling error into account, are the results above and beyond what you’d expect by random chance Distributions  Recall that a variable is a characteristic that can take different values.  The distribution of a variable is a summary of all the different values of a variable – both type (each value) and token (each instance) Distribution Example: Distribution of scores on an exam – A frequency histogram Frequency  20 18 16 14 12 10 8 6 4 2 0 18 17 12 11 10 8 7 5 3 1 5054 5559 6064 6569 7074 7579 8084 8589 9094 95100 Distribution  Properties of a distribution – Shape • Symmetric v. asymmetric (skew) • Unimodal v. multimodal – Center • Where most of the data in the distribution are – Spread (variability) • How similar/dissimilar are the scores in the distribution? Distributions  A picture of the distribution is usually helpful – Gives a good sense of the properties of the distribution  Many different ways to display distribution – Graphs • Continuous variable: – histogram, line graph (frequency polygons) • Categorical variable: – pie chart, bar chart – Table • Frequency distribution table • Stem and leaf plot Graphs for continuous variables   Histogram 20 16 12 8 4 0 50.0 60.0 55.0 EXAM2 70.0 65.0 80.0 75.0 90.0 85.0 100.0 95.0 Line graph Graphs for categorical variables   Bar chart Pie chart Cutting Doe Missing Smith Frequency distribution table VAR00 003 Va lid 1.00 Fre quen cy 2 Percent 7.7 Va lid Perce nt 7.7 Cumu lati ve Percent 7.7 2.00 3.00 4.00 3 3 5 11 .5 11 .5 19 .2 11 .5 11 .5 19 .2 19 .2 30 .8 50 .0 5.00 6.00 7.00 8.00 4 2 4 2 15 .4 7.7 15 .4 7.7 15 .4 7.7 15 .4 7.7 65 .4 73 .1 88 .5 96 .2 9.00 To tal 1 26 3.8 10 0.0 3.8 10 0.0 10 0.0 Values (types) Counts Percentages Descriptive statistics   In addition to pictures of the distribution, numerical summaries are also presented. Numeric Descriptive Statistics – Shape: • Skew (symmetry) & Kurtosis (flatness) – Measures of Center: • Mean • Median • Mode – Measures of Variability (Spread) • Standard deviation (variance) • Range Shape  Symmetric  Asymmetric Positive Skew tail Negative Skew tail Shape  Unimodal (one mode)  Multimodal – Bimodal examples Center  There are three main measures of center – Mean (M): the arithmetic average • Add up all of the scores and divide by the total number • Most used measure of center – Median (Mdn): the middle score in terms of location • The score that cuts off the top 50% of the from the bottom 50% • Good for skewed distributions (e.g. net worth) – Mode: the most frequent score • Good for nominal scales (e.g. eye color) • A must for multi-modal distributions Spread (Variability)  How similar are the scores? – Range: the maximum value - minimum value • Only takes two scores from the distribution into account • Influenced by extreme values (outliers) – Standard deviation (SD): (essentially) the average amount that the scores in the distribution deviate from the mean • Takes all of the scores into account • Also influenced by extreme values (but not as much as the range) – Variance: standard deviation squared Variability  Low variability – The scores are fairly similar mean  High variability – The scores are fairly dissimilar mean Relationships between variables    Suppose that you notice that the more you study for an exam, the better your score typically is. This suggests that there is a relationship between study time and test performance. Computation of the Correlation Coefficient (and regression) - a numerical description of the relationship between two variables May be used for – – – – Prediction Validity Reliability Theory verification Correlation  For relationship between two continuous variables we use Pearson’s r (Pearson product-moment correlation)  It basically tells us how much our two variables vary together – As X goes up, what does Y typically do • X, Y • X, Y • X, Y Correlation  Properties of a correlation – Form • Linear • Non-linear – Direction • Negative • Positive – Strength • Ranges from -1 to +1, 0 means no relationship Scatterplot  Plots one variable against the other  Useful for “seeing” the relationship – Form, Direction, and Strength  Each point corresponds to a different individual  Imagine a line through the data points Scatterplot Y 6 X 6 1 Y 6 2 5 6 3 3 4 2 3 2 1 5 4 1 2 3 4 5 6 X Form Linear Non-linear Direction Negative Positive Y • As X goes up, Y goes up Y X X • As X goes up, Y goes down • X & Y vary in the same direction • X & Y vary in opposite directions • positive r • negative r Strength  Zero means “no relationship”. – The farther the r is from zero, the stronger the relationship  The strength of the relationship – Spread around the line (note the axis scales)  r2 sometimes reported instead – %variance in Y given X Strength r = -1.0 “perfect negative corr.” r2 = 100% -1.0 r = 0.0 “no relationship” r2 = 0.0 0.0 r = 1.0 “perfect positive corr.” r2 = 100% +1.0 The farther from zero, the stronger the relationship Strength Rel A Rel B r = 0.5 r2 = 25% r = -0.8 r2 = 64% -.8 -1.0 .5 0.0 Which relationship is stronger? Rel A, -0.8 is stronger than +0.5 +1.0 Regression  Compute the equation for the line that best fits the data points Y 6 5 Y = (X)(slope) + (intercept) 4 3 2 1 0.5 Change in Y 1 2 3 4 5 6 X Change in X 2.0 = slope Regression  4.5 Can make specific predictions about Y based on X Y 6 5 X=5 Y = (X)(.5) + (2.0) Y=? Y = (5)(.5) + (2.0) Y = 2.5 + 2 = 4.5 4 3 2 1 1 2 3 4 5 6 X Regression  Also need a measure of error Y = X(.5) + (2.0) + error Y = X(.5) + (2.0) + error • Same line, but different relationships (strength difference) Y 6 5 Y 6 5 4 3 2 1 4 3 2 1 1 2 3 4 5 6 X 1 2 3 4 5 6 X Multiple regression  You want to look at how more than one variable may be related to Y  The regression equation gets more complex – X, Z, & W variables are used to predict Y – e.g., Y = b1X + b2Z + b3W + b0 + error Cautions with correlation and regression Don’t make causal claims  Don’t extrapolate  Extreme scores can strongly influence the calculated relationship  Inferential Statistics  Why? – Purpose: To make claims about populations based on data collected from samples  What’s the big deal? – Example Experiment: • • • • Group A - gets treatment to improve memory Group B - control, gets no treatment After treatment period test both groups for memory Results: Group A’s average memory score is 80%, while group B’s is 76% • Is the 4% difference a “real” difference or is it just sampling error? Testing Hypotheses  Step 1: State your hypotheses – Null hypothesis (H0) • There are no differences (effects) • This is the hypothesis that you are testing – Alternative hypothesis(ses) • Generally, not all groups are equal • You aren’t out to prove the alternative hypothesis (although it feels like this is what you want to do) • If you reject the null hypothesis, then you’re left with support for the alternative(s) (NOT proof!) Hypotheses  In our memory example experiment – H0: mean of Group A = mean of Group B – HA: mean of Group A ≠ mean of Group B • (Or more precisely: Group A > Group B) – It seems like our theory is that the treatment should improve memory. – That’s the alternative hypothesis. That’s NOT the one the we’ll test with inferential statistics. – Instead, we test the H0 Testing Hypotheses  Step 2: Set your decision criteria – Your alpha level will be your guide for when to reject or fail to reject the null hypothesis   Step 3: Collect your data from your sample(s) Step 4: Compute your test statistics – Descriptive statistics (means, standard deviations, etc.) – Inferential statistics (t-tests, ANOVAs, etc.)  Step 5: Make a decision about your null hypothesis – Reject H0 – Fail to reject H0 Statistical significance  “Statistically significant difference” – When you reject your null hypothesis – Essentially this means that the observed difference is above what you’d expect by chance – “Chance” is determined by estimating how much sampling error there is – Factors affecting “chance” • Sample size • Population variability Sampling error Population mean Population Distribution x N=1 Sampling error (Pop mean - sample mean) Sampling error Population mean Population Distribution Sample mean x N=2 x Sampling error (Pop mean - sample mean) Sampling error Population mean Population Sample mean Distribution x N = 10 x x x x x x x xx Sampling error (Pop mean - sample mean)  Generally, as the sample size increases, the sampling error decreases Sampling error  Typically the narrower the population distribution, the narrower the range of possible samples, and the smaller the “chance” Small population variability Large population variability Sampling distribution  The sampling distribution is a distribution of all possible sample means of a particular sample size that can be drawn from the population Population Distribution of sample means Samples of size = n XA XB XC XD “chance” Avg. Sampling error Error types  Based on the outcomes of the statistical tests researchers will either: – Reject the null hypothesis – Fail to reject the null hypothesis  This could be correct conclusion or the incorrect conclusion – Two ways to go wrong • Type I error: saying that there is a difference when there really isn’t one • Type II error: saying that there is not a difference when there really is one Error types Real world (‘truth’) H0 is correct Reject H0 Experimenter’s conclusions Fail to Reject H0 H0 is wrong Type I error  Type II error  Error types: Courtroom analogy Real world (‘truth’) Defendant is innocent Defendant is guilty Type I error Jury’s decision Find guilty Type II error Find not guilty Error types  Type I error: concluding that there is an effect (a difference between groups) when there really isn’t. – – – –  Sometimes called “significance level” We try to minimize this (keep it low) Pick a low level of alpha Psychology: 0.05 and 0.01 most common Type II error: concluding that there isn’t an effect, when there really is. – Related to the Statistical Power of a test – How likely are you able to detect a difference if it is there 1 Significance  “A statistically significant difference” means: – the researcher is concluding that there is a difference above and beyond chance – with the probability of making a type I error at 5% (assuming an alpha level = 0.05)  Note “statistical significance” is not the same thing as theoretical significance. – Only means that there is a statistical difference – Doesn’t mean that it is an important difference Non-Significance  Failing to reject the null hypothesis – Generally, not interested in “accepting the null hypothesis” (remember we can’t prove things only disprove them) – Usually check to see if you made a Type II error (failed to detect a difference that is really there) • Check the statistical power of your test – Sample size is too small – Effects that you’re looking for are really small • Check your controls, maybe too much variability Inferential Statistical Tests  Different statistical tests – “Generic test” – T-test – Analysis of Variance (ANOVA) “Generic” statistical test  Tests the question: – Are there differences between groups due to a treatment? H0: is true (no treatment effect) XA XB H0: is false (is a treatment effect) XA XB “Generic” statistical test XA  XB Why might the samples be different? (What is the source of the variability between groups)? – ER: Random sampling error – ID: Individual differences (if between subjects factor) – TR: The effect of a treatment “Generic” statistical test XA  XB The generic test statistic Observed difference Difference from chance = TR + ID + ER ID + ER “Generic” statistical test  The generic test statistic distribution – To reject the H0, you want a computed test statistics that is large – This large difference, reflects a large Treatment Effect (TR) Distribution of the test statistic Reject H0 Fail to reject H0 1 tailed or 2 tailed   2-tailed tests “look” for any difference 1-tailed tests “look” for a difference in a specific direction (e.g. “an increase”, “an impairment” …) – Statistically more powerful 2-tailed test Reject H0 Fail to reject H0 1-tailed test Reject H0 Fail to reject H0 T-tests  Three types – One sample – 2-independent samples – Repeated measures samples  T-distribution – Centered on zero, negative and positive values – Degrees of freedom • Based on number of subjects in sample(s) • Tell you what t-distribution to look at Independent samples t-test  Design – 2 separate groups of participants (e.g. control and treatment)  Degrees of freedom – df = n1 + n2 - 2  Formula: Xtreat - Xcontrol T= Diff by chance Based on variability and size of the samples Independent samples t-test  Reporting your results – – – – –  The observed difference Kind of t-test Computed T-statistic Degrees of freedom for the test The “p-value” of the test “The mean of the treatment group was 12 points higher than the control group. An independent samples t-test yielded a significant difference, t(25) = 5.67, p < 0.05.” Repeated measures t-test  Design – 1 group of participants tested twice (e.g. pre-test and posttest)  Degrees of freedom – df = n - 1 (where n = number of difference scores)  Formula: Xpost - Xpre T= Diff by chance Difference scores Based on variability and size of the sample of “difference score” Repeated measures t-test  Reporting your results – – – – –  The observed difference Kind of t-test Computed T-statistic Degrees of freedom for the test The “p-value” of the test “The mean score of the post-test was 12 points higher than the pre-test. A repeated measures t-test demonstrated that this difference was significant significant, t(25) = 5.67, p < 0.05.” Analysis of Variance XA  XB XC Designs – More than two groups • 1 Factor ANOVA, Factorial ANOVA • Both Within and Between Groups Factors   Test statistic is an F-ratio Degrees of freedom – Several to keep track of – Vary depending on the design Analysis of Variance XA  XB XC More than two groups – Now we can’t just compute a simple difference score since there are more than1 difference – So we use variance instead of simply the difference • Variance is essentially an average difference Observed variance F-ratio = Variance from chance 1 factor ANOVA XA  XB XC 1 Factor, with more than two levels – Now we can’t just compute a simple difference score since there are more than1 difference • A - B, B - C, & A - C 1 factor ANOVA XA XB XC Null hypothesis: The ANOVA tests this one!! H0: all the groups are equal XA = XB = XC Alternative hypotheses HA: not all the groups are equal XA ≠ XB ≠ XC XA = XB ≠ XC XA ≠ XB = XC XA = XC ≠ XB 1 factor ANOVA Planned contrasts and post-hoc tests: - Further tests used to rule out the different Alternative hypotheses XA ≠ XB ≠ XC Test 1: A ≠ B Test 2: A ≠ C Test 3: B = C XA = XB ≠ XC XA ≠ XB = XC XA = XC ≠ XB 1 factor ANOVA  Reporting your results – – – – – –  The observed difference Kind of test Computed F-ratio Degrees of freedom for the test The “p-value” of the test Any post-hoc or planned comparison results “The mean score of Group A was 12, Group B was 25, and Group C was 27. A 1-way ANOVA was conducted and the results yielded a significant difference, F(2,25) = 5.67, p < 0.05. Post hoc tests revealed that the differences between groups A and B and A and C were statistically reliable (respectively t(1) = 5.67, p < 0.05 & t(1) = 6.02, p <0.05). Groups B and C did not differ significantly from one another” Factorial ANOVAs   We covered much of this in our experimental design lecture More than one factor – Factors may be within or between – Overall design may be entirely within, entirely between, or mixed  Many F-ratios may be computed – An F-ratio is computed to test the main effect of each factor – An F-ratio is computed to test each of the potential interactions between the factors Factorial ANOVA  Reporting your results – The observed differences • Because there may be a lot of these, may present them in a table instead of directly in the text – Kind of design • e.g. “2 x 2 completely between factorial design” – Computed F-ratios • May see separate paragraphs for each factor, and for interactions – Degrees of freedom for the test • Each F-ratio will have its own set of df’s – The “p-value” of the test • May want to just say “all tests were tested with an alpha level of 0.05) – Any post-hoc or planned comparison results • Typically only the theoretically interesting comparisons are presented

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 21.statistics - Illinois State University Department of Psychology