Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inference Sampling distributions Hypothesis testing Sample Numbers describing sample are statistics Population Numbers describing population are parameters Random Sample Every unit in the population has an equal probability of being included in the sample A random sample should represent the population well, so sample statistics from a random sample should provide reasonable estimates of population parameters All sample statistics have some error in estimating population parameters If repeated samples are taken from a population and the same statistic (e.g. mean) is calculated from each sample, the statistics will vary, that is, they will have a distribution Example • Average IQ in population is 100 • If I take repeated random samples of size 30 from the population, we would expect to get samples with means clustered about 100 • Some sample means would be far from 100 – but we expect fewer means the farther we get from 100 A larger sample provides more information than a smaller sample so a statistic from a large sample should have less error than a statistic from a small sample Review • • • • Random samples are the best Statistics have error Statistics have distributions Larger sample size (n) is better - less error The distribution of a statistic is called a sampling distribution. Distribution of X s when sampling from a normal distribution The distribution of means has a normal distribution with mean = x and standard deviation = x n Central Limit Theorem If the sample size (n) is large enough, The distribution of means has a normal distribution with mean = x and standard deviation = x n regardless of the population distribution Example • Average IQ = 100, SD = 15. • Take random samples of n = 6 adults. • Then, sample means are normally distributed with mean 100 and standard error 6.12 [from SD/sqrt(N) = 15/sqrt(6)]. Therefore • 68% of samples of n=6 adults will have an average IQ between 93.88 and 106.12 • 95% of samples of n=6 adults will have an average IQ between 87.75 and 112.24 • 99% of samples of n=6 adults will have an average IQ between 81.63 and 118.37 Same example: larger sample • Average IQ = 100, SD = 15. • Take random samples of n = 30 adults. • Then, sample means are normally distributed with mean 100 and standard error 2.74 [from SD/sqrt(n) = 15/sqrt(30)]. Therefore • 68% of samples of n=30 adults will have an average IQ between 97.26 and 102.74 • 95% of samples of n=30 adults will have an average IQ between 94.52 and 105.48 • 99% of samples of n=30 adults will have an average IQ between 91.78 and 108.22 • So … the larger the sample, the less the sample averages vary. Hypothesis Testing Two ways to learn about a population • Confidence intervals • Hypothesis testing Confidence Intervals • Allow us to use sample data to estimate a population value, like the true mean or the true proportion. • Gives a more accurate representation of what the true population value is likely to be Hypothesis Testing • Allows us to use sample data to test a claim about a population, such as testing whether a population mean equals some number. • e.g., do students spend more than 3 hours per week on their stats homework General Idea of Hypothesis Testing • Make an initial assumption. • Collect evidence (data). • Based on the available evidence, decide whether or not the initial assumption is reasonable. Example Population of 5 million college students Sample of 100 college students M=2.9 Is the average GPA 2.7? How likely is it that 100 students would have an average GPA as large as 2.9 if the population average was 2.7? Making the Decision • It is either likely or unlikely that we would collect the evidence we did given the initial assumption. • If it is likely, then we “do not reject” our initial assumption. There is not enough evidence to do otherwise. • Likely is determined by a probability Making the Decision (cont’d) • If it is unlikely, then: – either our initial assumption is correct and we experienced an unusual event – or our initial assumption is incorrect • In statistics, if it is unlikely, we decide to “reject” our initial assumption. Idea of Hypothesis Testing: Criminal Trial Analogy • First, state 2 hypotheses, the null hypothesis (“H0”) and the alternative hypothesis (“HA”) – H0: Defendant is not guilty. – HA: Defendant is guilty. Hypotheses • The null hypothesis always represents the status quo, i.e. the hypothesis that requires no change in current behavior. • The alternative hypothesis is the conclusion that the researcher is trying to make. Criminal Trial Analogy (continued) • Then, collect evidence, such as finger prints, blood spots, hair samples, carpet fibers, shoe prints, ransom notes, handwriting samples, etc. • In statistics, the data are the evidence. Criminal Trial Analogy (continued) • Then, make initial assumption. – Defendant is innocent until proven guilty. • In statistics, we always assume the null hypothesis is true. Criminal Trial Analogy (continued) • Then, make a decision based on the available evidence. – If there is sufficient evidence (“beyond a reasonable doubt”), reject the null hypothesis. (Behave as if defendant is guilty.) – If there is not enough evidence, do not reject the null hypothesis. (Behave as if defendant is not guilty.) Important Point • Neither decision entails proving the null hypothesis or the alternative hypothesis. • No matter what decision we make, there is always a chance we made an error. Errors in Criminal Trials Truth Jury Decision Guilty Not guilty Not guilty Guilty Error OK OK Error Errors in Hypothesis Testing Truth Null hypothesis TYPE I Reject null ERROR () Do not OK reject null Decision Alternative hypothesis OK TYPE II ERROR () Definitions: Types of Errors • Type I error: The null hypothesis is rejected when it is true. • Type II error: The null hypothesis is not rejected when it is false. • Always chance of error – want to minimize that chance Example Population of many, many adults Sample of 80 adults M=98.4 Is average adult body temperature 98.6 degrees? Or is it lower? Average body temperature of 80 sampled adults is 98.4 degrees. Example (continued) • Specify hypotheses. – H0: = 98.6 degrees – HA: < 98.6 degrees • Make initial assumption: = 98.6 degrees • Collect data: Average body temp of 80 sampled adults is 98.4 degrees. How likely is it that a sample of 80 adults would have an average body temp as low as 98.4 if the average body temp of population was 98.6? Using the probability to make the decision • The probability represents how likely we would be to observe such an extreme sample if the null hypothesis were true. • The probability is a number between 0 and 1. • Close to 0 means “unlikely.” • So if probabilities are “small,” (typically, less than 0.05), then reject the null hypothesis. Example (continued) The probability can easily be obtained from statistical software like SPSS. Say we calculate a p of 0.0026 Example (continued) • The p-value, 0.0026, indicates that, if the average body temperature in the population is 98.6 degrees, it is unlikely that a sample of 80 adults would have an average body temperature as extreme as 98.4 degrees. • Decision: Reject the null hypothesis. • Conclude that the average body temperature is lower than 98.6 degrees. What type of error might we have made? • Type I error here is claiming that average body temp is lower than 98.6 when in fact it really isn’t. • Type II error here is failing to claim that the average body temp is lower than 98.6 when it is. • We rejected the null hypothesis, i.e. claimed body temp is lower than 98.6, so we may have made a Type I error.