Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics Chapter 10b Introduction to Inference Time: 3 weeks Confidence Intervals Vs. Tests of Significance The goals: CI: Tests of Significance: Star Free Throw Shooter I claim that I make 80% of my free throws. To test my claim, you ask me to shoot 20 free throws. I make only 8 of the 20. “Aha!” you say. “Someone who makes 80% of his free throws would almost never make only 8 out of 20. So I don’t believe your claim.” You can say how strong the evidence against my claim is by giving the probability that I would make as few as 8 out of 20 shots if I really make 80% in the long run. This probability is 0.0001. The small probability convinces you that my claim is false. The basic idea is that an outcome that would rarely happen if a claim were true is that the claim is not true. Hypothesis Testing A hypothesis is a claim or statement about the value of a single . Fritos For example, Frito-Lay claims that an average bag of Ruffles weighs 14 oz, so we choose a random sample of bags and see if the data support the hypothesis that =14. Not guilty means Guilty means Null & Alternative Hypothesis We assume one of the hypotheses to be true. First, the null hypothesis or the hypothesis that says that there is in the population. or We reject the null hypothesis in favor of the alternative hypothesis only if the null hypothesis. Possible Outcomes: There are 2 possible outcomes of a test of hypothesis: . We reject Ho when . If the sample doesn’t contain such evidence, . In other words, we won’t be proving the “innocence”, . General Format: Ho: Ha: P-Value ...is the probability that the observed statistic value (or an even more extreme value) could occur if the null model were correct. If the P-value is small enough, . How small is small enough? Rule of thumb: . Which is saying that chance alone would rarely produce such a result Formal Hypothesis Testing in 5 Easy Steps Name the test, define the variable, and set the level. State the hypothesis State and check the conditions Calculate the test statistic and P-value (include a picture) Make a decision based on the P-value (in context) 2 Errors in Hypothesis Testing Type I error: Type II error: Bolt Problem A machine is set to produce bolts with an average diameter of 1 cm. Every hour a sample is inspected and the machine is adjusted if there is convincing evidence that the average diameter is not 1 cm. State the hypothesis and describe both kinds of errors for this testing procedure. Pregnancy Test A pregnancy test is designed so that it will correctly detect a pregnancy 99% of the time and correctly determine that a person isn’t pregnant 90% of the time. If the null hypothesis is “not pregnant”, describe both types of errors and find their probabilities. and If = .05 , then we are using a testing procedure that will make a Type I error about 5% of the time. That is, if we were to take many samples and perform many tests, in about 5 out of every 100 tests, we would reject the null hypothesis when it is actually true. If the value of goes down, the value of goes up. (they are inversely related) 3 Choosing Choose largest value tolerable (between .01 and .10) Judicial system Pregnancy test Antibacterial Cream We are testing a new cream on a small cut. We know from previous research that with no medication, the mean healing time is 7.6 days, with a standard deviation of 1.4 days. The claim we want to test is that the new formulation speeds healing. We will use a 5% significance level. Hypothesis Testing in 5 Easy Steps Name the test, define the variable, and set the level. State the hypothesis State and check the conditions Calculate the test statistic and P-value (include a picture) Make a decision based on the P-value (in context) Procedure: Cut 25 volunteer college students and apply the new formula to the wound. The mean healing time for these subjects is = 7.1 days. We will assume that = 1.4 days. Hypothesis Tests for a Population Proportion According to an article in the San Gabriel Valley Tribune (2-1303), “Most people are kissing the ‘right way’.” That is, according to the study, the majority of couples tilt their heads to the right when kissing. Define p = A researcher observed 124 couples kissing in various public places and found that 83/124 (66.9%) of the couples tilted to the right. Is this convincing evidence that p > .5? 4 What is the probability that we get a sample proportion this high by random chance, assuming the null hypothesis is true? Class Activity Perform the simulation: 0-4 = kiss to the right 5-9 = kiss to the left For each run we will generate 124 integers from 0-9 to represent the 124 observed couples. We will then count the number of digits from 0-4. Finally, we will compute ê, the sample proportion of couples that tilt to the right. randInt(0,9,124)L1 then set window x:scale to 5 graph the histogram then trace Can we reject the null hypothesis and conclude that the majority of couples do tilt to the right when kissing? P-Value The probability that we get an observed value as or more extreme as the one we observed (assuming the null hypothesis is true) is called a p-value In the previous kissing problem, the p-value was . What if the p-value for the kissing problem was .23 instead of 0? Likely vs. Unlikely to Happen by Random Chance What is the cut-off? Calculating P-values In the kissing example, we start by assuming p = .5 Test conditions: P(p>.669) = 5 Use standard deviation of sampling distribution: σ ê = p (1-p) / n 1-Sample z Test for Population Proportion Using the following conditions: Random sample from population of interest Large sample size: np > 10 and n(1-p) > 10 Note: when checking conditions and calculating σê, we always use the true value (p) if we know it. Since we assume a value for p (Ho) when doing a hypothesis test, we will always use this value. With confidence intervals, we do not make any assumptions about the true value of p, so we have to use the value of ê to estimate p Kissing Example Revisited Hypothesis Testing in 5 Easy Steps Name the test, define the variable, and set the α level. State the hypothesis State and check the conditions Calculate the test statistic and P-value (include a picture) Make a decision based on the P-value (in context) Eating Alone Are women less likely to eat alone? Suppose that a restaurant manager observed people eating alone at his restaurant over several days. Of the 48 solo eaters he observed, 20 were women. Does this data give evidence at the .01 level that women are less likely than men to eat alone? 6 Teenage Births According the National Center for Health Statistics, 12.3% of all births in the US were to teenagers in 1999. To see if this percentage is the same in California, a random sample of 1000 CA births were investigated and 111 were to teenage mothers. Can we conclude that the percentage of teenage births is different in CA at the 10% significance level? Statistically Significant When the results of a study are unlikely to happen by chance alone. Whenever we reject the null hypothesis, we have statistically significant results. This does not mean that the results are also practically significant. If results are practically significant, they usually lead to a change of policy. Caution: when the sample size is really large, even very small differences will give significant results 7 Power of a Test The power of a test is the probability of correctly rejecting Ho and the alternative is really true. That is, the probability of rejecting Ho when it is false. Decision & Action Decide there is no problem (Ho is true) Decide that there is a problem (Ho is false) 100% alpha Type I risk (): •chance of acquitting an innocent false alarm risk defendant •risk of crying wolf when there isn’t one Do not reject Ho •Quality acceptance sampling; chance of •risk of convicting an innocent defendant accepting a good lot There isn’t a •quality acceptance sampling; risk of •SPC: chance of calling the process in problem; the rejecting a good lot situation is as it control when it is •SPC: risk of calling the process out of •DOE: conclude that there is no difference control when it is in control should be. between the treatments when there isn’t •Design of experiments (DOE); risk of concluding that there is a difference between the treatments when there isn’t Type II risk (): Power (1-): Risk of missing the problem A test’s ability to detect a real problem, or •Risk of not seeing the wolf difference Reject Ho. •risk of acquitting a guilty defendant •chance of seeing the wolf There is a problem; •quality acceptance sampling; risk of •chance of convicting a guilty defendant the situation shipping a bad lot •quality acceptance sampling; chance of requires •SPC; risk of calling the process in control rejecting a bad lot adjustment. when it is out of control •SPC: chance of calling the process out of •DOE: chance of missing a difference control when it is between the treatments •DOE: chance of detecting a difference between treatments If Ho is false, P(Type II error) = The greater the power, the greater the chance of detecting the truth. 8 High School Diploma Suppose that in the 1990 Census, 83% of Californians had a high school diploma. The Department of Education believes this percentage has gone up since then so they commission a survey to estimate the true percentage. Define the parameter of interest, state the hypotheses, and describe each kind of error in context. Describe the power in context. What Affects Power? The power of a test will be higher if: 1. you increase the significance level (α) 2. you increase the sample size (n) 3. there is large discrepancy between the null hypothesis and true value. 4. there is little variability in the population Which can you control? What are the disadvantages of options 1 & 2? Increases the probability of type I error Increases the cost and time required for the study 9 TBBMC Problem Can a 6-month exercise program increase the total body bone mineral content of young women? A team of researchers is planning a study to examine this question. Based on the results of a previous study, they are willing to assume that σ = 2 for the percent change. A change in TBBMC of 1% would be considered important, and the researchers would like to have a reasonable chance of detecting a change this large or larger. Is 25 subjects a large enough sample for this project? Summary What is the relationship between confidence intervals and hypothesis tests? Hypothesis tests are designed to answer the questions: Confidence intervals are designed to answer the question: Chapter 10b problems: # 27, 29, 30, 31, 33, 38, 42, 44, 49, 58, 61, 63, 65, 66, 68, 73, 74 10 In Quest of the Perfect Hypothesis Test State what kind of distribution is appropriate for the data Normal t Binomial Geometric Chi-Square Other AND state what you are comparing A population to a sample ...called a 1 sample distribution (fill in blank with word “normal”, “z”, Binomial”, etc.) OR are you comparing two samples to each other ...called a 2 sample distribution Choose type of test: 1 tailed test (< or >) or a 2 tailed test (not equal) a sketch is a good idea Use correct format of Ho and Ha Ho: = . Ha: >, <, or . Or state null and alternative hypothesis using sentences. In general, use Greek symbols in hypothesis, not regular alphabetic symbols ****if you are going to set an α , do it HERE Correct calculation of standard deviation This is determined by knowing which distribution you choose. Correctly calculate test statistic The z score, or t, or chi-square, etc Correct P-value (Doubled for two tailed) Correct decision In general, a small P-value indicates you should reject Ho. A large P-value indicates you should fail to reject. Correct and complete statement of conclusion, restating the question. Example of a perfect conclusion: “I reject the null hypothesis. Since the possibility of having a sample mean of 20.1 is so small when the population mean is 18.2, (P-value of .0002) I conclude that Logan High’s mean ACT is truly different from the national mean ACT.” Or: “I fail to reject the null hypothesis. There is not enough evidence to show that the sample mean of 18.1 is different from the population mean of 18.2. My P-value of .31 indicates I could have gotten results this close to the population mean in random samples 31% of the time.” Always use excellent organization, clarity and readability.