Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inferential Statistics Comparison of Group Means Normal curve Characteristic curve Bell curve Normal distribution Gaussian distribution Continuous probability distribution 0.13% Standard Deviations (z scores) Cumulative Percentages 2.14% 13.59% -3 -2 .01% Percentile Equivalents 2.3% 1 5 10 34.13% 34.13% -1 0 15.9% 50.0% 20 30 0.13% 13.59% 2.14% +1 +2 84.1% 97.7% 40 60 80 50 70 90 95 +3 99.9 % 99 IQ Scores 55 70 85 100 115 130 145 SAT Scores (sd 209) 400 608 817 1026 1235 1444 1600 Some Definitions • A population is the largest group to which you wish to generalize. • Summaries of the measures of the characteristics of individuals in a population are called parameters. • Groups selected from the population for study are called a sample. • Summaries of the measures of the characteristics of individuals in a sample are called statistics. Normal curves: 1. Lots of measures in the middle—fewer toward the extremes (bell shaped). 2. Symmetrical 3. Mean, median, and mode are the same Often distributions aren’t normal. Negatively skewed distribution • Distributions where the bulk of the responses are weighted in one direction are called skewed distributions. • If the majority of the data are toward lower values, the distribution is negatively skewed. Often distributions aren’t normal. Positively skewed distribution • Distributions where the bulk of the responses are weighted in one direction are called skewed distributions. • If the majority of the data are toward lower values, the distribution is negatively skewed. • If the majority of the data are toward higher values, the distribution is positively skewed. Median Skewed distributions behave badly Mean Mode • The mean, median, and mode are not identical. • The distribution is not symmetrical. • The relationship between the standard deviation and the percent of the responses under the curve is not a constant. Kurtosis • The characteristic of a distribution to be sharper (or flatter) than expected is called kurtosis. Why Worry About the Shape • Inferential statistics are based on the assumption of normal distribution. • Since we are using samples of populations some variation from normal would be expected because of sample size. • The best way to decide if distributions are not normal (problematic) is to produce a histogram and look at it. Inferences about the relationship of a sample to a population are based on normal distributions. 0.13% 0.13% 2.14% 13.59% Standard Deviations -3 -2 34.13% -1 34.13% 0 +1 13.59% 2.14% +2 +3 Population Curve Mean Group of 30 Population Mean We already know how to compute percentile rank for an individual. What would happen if I looked at a group of 30 instead of an individual? Frequency Distribution for Groups of 30 Population Mean Computing the means of groups of 30 randomly selected from the population. Theoretically, a new frequency distribution will appear representing the means of groups of 30. Sampling Distribution of the Mean Population Mean Population Distribution Sampling Distribution Mean (n = 30) Sampling Distribution for random groups of 30 Sample Sizes n=1000 If the sample size was really large then it would be unlikely to get sample group means that were very far from the population mean. Population Distribution n=100 n=30 If the sample size was really small then sample group means would be distributed similarly to the population. As the sample size increases the Sampling Distribution of the Mean gets narrower. Standard Error of the Mean n=1000 n=100 n=30 Population Distribution The standard error is the standard deviation of the population adjusted for sample size. The standard deviation of a Sampling Distribution is called the Standard Error. For any distribution the larger the sample size the smaller the numerical standard error. The standard error of the sampling distribution is a function of the sample size. Sampling Distribution n=30 test scale 1 standard error The standard error of the sampling distribution is a function of the sample size. Sampling Distribution n=100 test scale 1 standard error Standard Error Sampling Distribution Mean 1 Standard Error Sampling Distribution (n = 30) 2 Standard Errors 50.0 34.13 13.59 The relationship of standard error to a sampling distribution is the same as standard deviation to a normal distribution. Standard Error Sampling Distribution Mean 1 Standard Error Sampling Distribution (n = 30) 2 Standard Errors 50.0 34.13 13.59 Remember this is not a distribution of scores on the test. It is a distribution of the means of randomly selected groups of 30. What do we have so far? • Normal distributions are good because of the relationship of the sd to the area under the curve. • When a sampling distribution is built from a population it is normally distributed. • Standard error is the population standard deviation adjusted for group size. • The relation of the standard error to a sampling distribution is the same as a standard deviation to a population distribution. Measuring Group Means Against the Sampling Distribution Sampling Distribution Mean 1 Standard Error Sampling Distribution (n = 30) 50.0 34.13 15.87% (.16) Probability of higher mean At any given Standard Error it is possible to compute the likelihood that a higher or lower group mean could have occurred by chance (remember the relationship of z-scores to percentile rank). Measuring Group Means Against the Sampling Distribution Sampling Distribution Mean Sampling Distribution (n = 30) 2 Standard Errors 13.59 50.0 34.13 2.28% (.02) Probability of higher mean At any given Standard Error it is possible to compute the likelihood that a higher or lower group mean could have occurred by chance (remember the relationship of z scores to percentile rank). Measuring Group Means Against the Sampling Distribution Sampling Distribution Mean Sampling Distribution (n = 30) 2 Standard Errors 13.59 50.0 34.13 2.28% (.02) Probability of higher mean Remember that each group mean that got into the sampling distribution represents a group that was randomly selected. Probability • How unlikely does the occurrence of a group mean have to be before we would say that it is so unlikely that it couldn’t have happened by chance? It must have occurred for some other reason. Significance Probability of higher mean If the probability that a given group mean would occur by chance in a sampling distribution is very small then the occurrence of that group mean is said to be significant. Significance Probability of higher mean To use the z-score analogy when the percentile ranking of a group mean is really high (or really low) then it is significant. It is significant because it is unlikely to occur randomly. Significance 5% (.05) 1% (.01) Most social science research declares that group means occurring by chance less than 5% of the time are significant. Most medical research uses 1% or much less. • With a z-score we used a table to look up percentile rank. Since the normal distribution is now a sampling distribution based on a specific group size the z-table won’t work. • A new table for each possible group size needs to be generated to test the percentile rank of the group mean comparison. • Fortunately the computer does this for you. Occurs randomly 5% of the time (.05) Occurs randomly 1% of the time (.01) Probability (p) that a given score or higher could have appeared by chance is .01 (1% or 1 in 100 times), substantially less than .05 (5 % or 5 in 100 times [1 in 20]) t-Critical • The sampling distribution of the mean is a different shape for every sample size (n). • This point on the distribution where a given mean becomes significant is called t-critical. • In the old days we would compute a t-score (analogous to a z-score) and then look up the significance point on a table. • Now you can read the p value directly to get the same information. Inquiry Science • Mary wants to know if teaching science with inquiry techniques will do better than more traditional methods. She teaches a unit in her class using inquiry science methods and then compares the chapter end test scores of her students with students in John’s class who have been taught the same unit with more traditional methods. • What is the probability that the group mean differences could have appeared by chance? Brown Study Hypothesis Testing Analysis based on normal distributions • Dependent variable—the measurement. Outcome data that will be compared. • Independent variable—how comparison groups are identified. The intervention variable. • Hypothesis: A tentative explanation that accounts for a set of facts and can be tested by further investigation • Problem statement: The purpose of this study is to examine the impact of Direct Instruction reading programs on district test scores. – Hypothesis: Students who participate in direct instruction reading programs will have higher scores than those who do not. Proof • Gather information on the dependent variable (test scores) and compare the scores based on the independent variable (which scores come from students in which program: DI or not). • What if most DI students were better but not all? • What about all the students from whom you did not have scores? • Proof means for all cases. You cannot prove your hypothesis. What happens if this is reversed? • If your hypothesis says that there will be a difference, could you test to see if there was not a difference? • Null hypothesis: stating that no differences would be found when you really are interested in finding them. • If the hypothesis says that something is true then the null hypothesis says that it is not true. Hypothesis Testing • Hypothesis: Kids with DI will have higher scores than those from other programs. • Null Hypothesis: There will be no difference between kids with DI and those from other programs. • You gather evidence and decide if the evidence says that direct instruction students had better scores. • You show that the hypothesis is not not true. • You reject the null hypothesis. Using Null Hypotheses • Use a sample (otherwise we would use parameters) • Stop talking about individuals and only compare groups (mean and standard deviation) Hypothesis Testing Mean of Kids without DI Mean of Kids with DI In statistical parlance we want to know if the two groups have come from the same population (fail to reject null) or do they represent samples from different populations (reject the null). Hypothesis Testing Mean of Kids without DI Mean of Kids with DI Sampling distribution of the mean for the sample size of the kids without DI What is the probability that the group mean difference could have appeared by chance? • Measure a bunch of people and if there is little difference among groups (the difference was likely to appear by chance) you say, I have been unable to find evidence that differences exist. • You have been unable to reject the null hypothesis. You have no evidence that your independent variable had any impact. • If the differences among groups are large enough you say the differences could not have happened by chance. • You have been able to reject the null hypothesis. Your independent variable becomes, for now, the best explanation for the differences. Why do this? • Inferential statistics uses samples of populations to determine if differences in group means could occur by chance. • Inferential statistics is not used to prove hypotheses. It is used to demonstrate whether null hypotheses are not true. Think about this as steps: 1. You want to know that something is true but you cannot absolutely know because you can’t test all cases. 2. Instead you focus on the opposite of what you want to know is true (the null hypothesis). 3. Start gathering evidence. 4. If enough cases show that something is true then saying it was not true is false. 5. You have rejected the null hypothesis. 6. For the time being, the thing you wanted to show is true (your hypothesis) is the best explanation. Ok, one more semantic step and we are done: • What we are interested in knowing is whether something made two groups different. • The evidence that you gather to see if that is true are means and standard deviations of groups. • The null hypothesis always says there will be no difference between the groups. • If there is a very low probability that the difference in the groups could occur by chance, that is the evidence that the null hypothesis is wrong. • Low probabilities (p values) show that the thing you wanted to show (your hypothesis) is more likely to be true than not true. You have rejected the null hypothesis. Significance (Alpha Level) 5% (.05) 1% (.01) Choosing the point at which we say that something is significant is subjective. It is up to the researcher. Why let this subjective sounding judgment enter the process? Being Wrong Test Group Mean 5% (.05) • We say that occurring randomly less than 5% of the time is really unlikely so it is not random. But, that statement would be wrong 5% of the time. • Type 1 Error: Saying it is not random when it was. Being Wrong Test Group Mean 5% (.05) • We say that occurring randomly more than 5% of the time is too likely so we say chance is the best explanation. But, sometimes real differences occur even though they look like chance. • Type 2 Error: Saying it is random when it was not. Reducing Being Wrong 5% (.05) • Reduce Type 1 errors by lowering the alpha level or using more conservative calculations. • Reduce Type 2 errors by increasing the sample size. • Reduce all errors by improving the study design (validity). Excel Example On the course resource site download: Excel Files: Area Test Compare the pre and post scores using a Paired t-Test