Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Welcome to Week 08 College Statistics http://media.dcnews.ro/image/201109/w670/statistics.jpg http://www.andreabalt.com/wp- http://www.howtodrawjourney.com/images/da- Now, for something even more profound… TRUTH Truth Question on a true/false test: 1) US presidents are named “Barack” T ___ F ___ Truth Question on a true/false test: 1) US presidents are named “George” T ___ F ___ Truth Question on a true/false test: 1) US presidents are male T ___ F ___ Truth Question on a true/false test: 1) US presidents are at least 35 years old T ___ F ___ Truth So, it is easier to be “false” than to be “true” Truth So, it is easier to be “false” than to be “true” To be “true” a statement must be true in all cases Truth So, it is easier to be “false” than to be “true” To be “true” a statement must be true in all cases If not, it is “false” The Bad News… We live in a world where the “truth” is not always known http://www.testically.org/wp-content/uploads/2010/11/hmm.jpg Guilt In a court of law, you never REALLY know if someone is guilty or not www.torontoinjurylawyerblog.com Guilt Even with an eyewitness, the witness could be: - Mistaken - Lying Guilt There is always a level of UNCERTAINTY Guilt Two standards of proof of guilt in US courts of law: 1) Criminal cases – “beyond a reasonable doubt” 2) Civil cases – “a preponderance of the evidence” Guilt In probability terms, legal authorities estimate: “beyond a reasonable doubt” is 98-99% likelihood of guilt based on the evidence (Thanks to Ronald B. Standler) Guilt In probability terms, legal authorities estimate: “a preponderance of the evidence” is just a hair over 50% likelihood of guilt (Thanks to Ronald B. Standler) Guilt There is always the possibility of being WRONG http://www.gettyimages.com/detail/photo/businessman-cryingclose-up-high-res-stock-photography/AB34183 Guilt In the US, a suspect is considered “innocent until proven guilty in a court of law” www.coloradospringsdivorceattorneyblog.com Guilt BUT… if a suspect is not found guilty in court, are they called: - innocent ? - not guilty ? www.legaljuice.com Guilt The suspect is called “not guilty” because the defense hasn’t proved their innocence… it is just that the prosecution was unable to prove their guilt! Questions? http://i.imgur.com/aliTlT3.jpg Hypothesis Tests In science, an “educated guess” is called a: hypothesis Hypothesis Tests In science, using experimental evidence to see if it supports a hypothesis is called a: hypothesis test Hypothesis Tests Vikings in Newfoundland? http://i.cbc.ca/1.3517691.1459555583!/fileImage/httpImage/image.JPG_gen/derivatives/original_ 620/digging-at-point-rosee.JPG Hypothesis Tests Our hypothesis was that we wouldn’t find anything… Hypothesis Tests We rejected that hypothesis! http://www.cbc.ca/news/canada/newfoundland-labrador/vikings-newfoundland-1.3515747 Hypothesis Tests In practice, we often do hypothesis tests “undercover” as Hypothesis Tests In practice, we often do hypothesis tests “undercover” as CONFIDENCE INTERVALS Hypothesis Tests Suppose we had a 95% confidence interval: 5 ≤ µ ≤ 10 Suppose our hypothesis was that µ = 7 Is 7 a likely value for µ given our confidence interval? Hypothesis Tests Because µ = 7 is in our confidence interval: 5 ≤ µ ≤ 10 It is a possible value given our data Hypothesis Tests Because µ = 7 is in our confidence interval: 5 ≤ µ ≤ 10 It is a possible value given our data… but so are µ = 6 µ = 8 µ = 9.3 µ = 5.1 µ = 6.79431… Hypothesis Tests What if the hypothesized value for µ was 11? 5 ≤ µ ≤ 10 We are 95% confident that µ cannot be 11 given our evidence Hypothesis Tests We reject the hypothesis that µ = 11 if 5 ≤ µ ≤ 10 with 95% confidence Hypothesis Tests We reject the hypothesis that µ = 11 if 5 ≤ µ ≤ 10 with 95% confidence We will be wrong to do this 5% of the time (100% - 95%) Hypothesis Tests We reject the hypothesis that µ = 11 if 5 ≤ µ ≤ 10 with 95% confidence We will be wrong to do this 5% of the time (100% - 95%) The amount of time we are willing to be wrong is called our “α-level” Hypothesis Tests The confidence interval can be used to test hypothesized values of µ using the mean, standard deviation and sample size of our sample data Hypothesis Tests Whether we can reject a hypothesis or not depends on how variable our data are! Not too different… Very different! Hypothesis Tests (see why variability is important?) Questions? http://i.imgur.com/aliTlT3.jpg Hypothesis Tests Rejecting a hypothesis is a strong statement We have evidence to show µ ≠ 11 Hypothesis Tests If the value is included in the confidence interval, you cannot make a strong statement We haven’t proved µ = 7 (because it could be a wide range of numbers within the interval) Hypothesis Tests So, we merely “fail to reject” the hypothesis Hypothesis Tests Our exercise on human temperature last week was a test of the hypothesis that normal human temperature is 98.6° Hypothesis Tests 98.6° Hypothesis Tests PROJECT QUESTION You have a hypothesis that normal human body temperature is 98.6° You have experimentally found that measured using an IR thermometer, the inside mouth temperature is between 89.9° and 93.6° with 95% confidence Hypothesis Tests PROJECT QUESTION 89.9° < temp < 93.6° What do you decide about your hypothesis that human body temperature is 98.6°? Hypothesis Tests PROJECT QUESTION 89.9° < temp < 93.6° Reject your hypothesis that human body temperature is 98.6° What is the probability that you are wrong to reject this hypothesis? Hypothesis Tests PROJECT QUESTION 89.9° < temp < 93.6° Reject your hypothesis that human body temperature is 98.6° What is the probability that you are wrong to reject this hypothesis? 5% Questions? http://i.imgur.com/aliTlT3.jpg Hypothesis Tests You need to answer an "Is there a difference" question Is there any difference between these two populations? Does some new process improve results? Hypothesis Tests There is a TRUE (population) answer to your question Hypothesis Tests You will NEVER find the true answer to most questions because of variability: in your measurements in the data itself in the measuring tool in the samples you get from your population Hypothesis Tests Are the statistics demons mad at you today? Hypothesis Tests Reality of life: things aren't clear, certain and constant They are fuzzy, uncertain and variable Hypothesis Tests This is the basis of statistics getting a measurement of the fuzziness - "variability" Hypothesis Tests A hypothesis is a statement about the properties of the population Hypothesis Tests It may be obtained from theory, hearsay, historical studies, etc. Hypothesis Tests A null hypothesis states "there is no difference between populations" or "a process has no effect" Hypothesis Tests It is symbolized: H0 Hypothesis Tests Because it is easier to prove something false than to prove it true… H0 is the hypothesis we want to reject Hypothesis Tests We want to show the populations are different or the process has an effect - called the alternate hypothesis or Ha Hypothesis Tests Usually we set Ha before H0, since it is the one we are interested in Hypothesis Tests Null hypotheses about population means are typically like: μ = some value Hypothesis Tests Alternative hypotheses about means can be: μ ≠ some value (called a two-tailed test) μ < some value μ > some value (called one-tailed tests) Hypothesis Tests A two-tailed test will reject H0 either if the experimental values we get are too high or too low Hypothesis Tests α is split between the upper and lower tails Hypothesis Tests A one-tailed test will reject H0 only on the side we think is likely to be true Hypothesis Tests You will be able to reject H0 more often for a one-tailed test – if you pick the right tail! Hypothesis Tests PROJECT QUESTION Your owner's manual says you should be getting 30 mpg highway After owning the car for six months, you are only getting 27 mpg highway Hypothesis Tests PROJECT QUESTION Is that different enough to reject the company's claim? What is your α-level? What is H0? What is Ha? Hypothesis Tests PROJECT QUESTION Is that different enough to reject the company's claim? What is your α-level? 5% or 0.05 What is H0? What is Ha? Hypothesis Tests PROJECT QUESTION Is that different enough to reject the company's claim? What is your α-level? 5% or 0.05 What is H0? μ = 30 mpg What is Ha? Hypothesis Tests PROJECT QUESTION Is that different enough to reject the company's claim? What is your α-level? 5% or 0.05 What is H0? μ = 30 mpg What is Ha? μ < 30 mpg Hypothesis Tests PROJECT QUESTION We could also write it as: H0: μ ≥ 30 mpg Ha: μ < 30 mpg Hypothesis Tests PROJECT QUESTION Is this a one-tailed or a twotailed test? Hypothesis Tests PROJECT QUESTION Is this a one-tailed or a twotailed test? one-tailed Is it right-tailed or left-tailed? Hypothesis Tests PROJECT QUESTION Is it right-tailed or left-tailed? left-tailed Questions? http://i.imgur.com/aliTlT3.jpg Hypothesis Tests The experiment is designed to gather valid information to test the likelihood of that null hypothesis being true Hypothesis Tests So, since we want to show the null hypothesis is NOT true, we want to show that getting the results we got (if the null hypothesis IS true) is very unlikely Hypothesis Tests If we get those “unlikely” data Hypothesis Tests Then we reject the null hypothesis and have statistically proved our alternative hypothesis and Hypothesis Tests CELEBRATE! Hypothesis Tests But any experiment runs the risk of weird results The objective of hypothesis testing is to estimate the likelihood of weird results Hypothesis Tests One type of error consists of rejecting a true hypothesis We call this a “Type 1 error” Hypothesis Tests If this happens, people will accuse us of rigging our data to prove Ha So, we want this to happen very rarely Hypothesis Tests The probability of a Type 1 error is called Hypothesis Tests The probability of a Type 1 error is called an α-level Hypothesis Tests Typically we use α = 0.05 (5%) or 0.01 (1%) Hypothesis Tests If is crucial to set your α-level before you do the experiment or gather any data Hypothesis Tests If is crucial to set your α-level before you do the experiment or gather any data Otherwise people will accuse you of setting the level to ensure rejecting H0 Hypothesis Tests You can make the opposite mistake: fail to reject H0 when it is false Called a Type 2 error The probability of this kind of error is denoted by β (beta) Hypothesis Tests We HATE Type 2 errors because they mean we FAILED to prove what we wanted to prove! (Remember, we want to reject H0) Hypothesis Tests Usually β is computed after the experiment (not determined in advance by the experimenter) Hypothesis Tests Generally, the larger α value that you permit, the smaller β value you will end up with Conversely, if you demand a smaller α, you will usually get a larger β Hypothesis Tests Other factors affecting β: sample size it’s harder to detect a difference if it’s really really tiny Hypothesis Tests Likelihood of making the right decision and rejecting the (false) null hypothesis is: 1 - β called the “power of the test” Hypothesis Tests For a given α value, we would like the test to be as "powerful" as possible, give us the best chance of rejecting a false null hypothesis Hypothesis Tests PROJECT QUESTION Which is more powerful, a onetailed or a two-tailed test? Hypothesis Tests PROJECT QUESTION Which is more powerful, a onetailed or a two-tailed test? one-tailed (if you guess the right side) Hypothesis Tests This setup allows us only to disprove a null hypothesis, never prove it Hypothesis Tests We either disprove it, or we fail to disprove it Hypothesis Tests We NEVER accept the null hypothesis Hypothesis Tests "Fail to reject" the null hypothesis is the defaultdecision Hypothesis Testing This results not from evidence in favor of the null hypothesis but from the absence of evidence against it Hypothesis Tests Rejecting the null hypothesis is a strong conclusion, stating that (with no more than α given chance of error) the null hypothesis is wrong Hypothesis Tests The confidence interval for the hypothesis test will be kinda the opposite of what we did before Now we will create a confidence interval for 𝒙 based on our hypothesized value for μ and see if our 𝒙 falls in it Hypothesis Tests How to do it! Hypothesis Tests How to do it! Set your α-level (how often you are willing to be wrong) Hypothesis Tests How to do it! Set your α-level Define your Ha and H0 Hypothesis Tests How to do it! Set your α-level Define your Ha and H0 Get your data (for a confidence interval, you need the hypothesized μ, s and n (or se) Hypothesis Tests How to do it! Set your α-level Define your Ha and H0 Get your data Find your critical value (for two-sided α=5% it is ≈2) Hypothesis Tests How to do it! Set your α-level Define your Ha and H0 Get your data Find your critical value Calculate the confidence interval using μ rather than 𝒙 Hypothesis Tests How to do it! Set your α-level Define your Ha and H0 Get your data Find your critical value Calculate the confidence interval for 𝒙 The test will be: Is 𝒙 in it? Hypothesis Tests PROJECT QUESTION Back to our mpg! H0: μ ≥ 30 mpg Ha: μ < 30 mpg x = 27 And suppose we know that: se = 4 mpg Hypothesis Tests PROJECT QUESTION H0: μ ≥ 30 mpg Ha: μ < 30 mpg x = 27 se = 4 mpg Are we going to reject H0 for values of x greater than 30 or less than 30? Hypothesis Tests PROJECT QUESTION H0: μ ≥ 30 mpg Ha: μ < 30 mpg x = 27 se = 4 mpg If the critical value for a onesided confidence interval test at the 5% level is 1.64, create a test of our hypothesis Hypothesis Tests PROJECT QUESTION H0: μ ≥ 30 mpg Ha: μ < 30 mpg x = 27 se = 4 mpg Reject H0 if x < 30 - (1.64)(4) < 23.44 What is our conclusion? Hypothesis Tests PROJECT QUESTION H0: μ ≥ 30 mpg Ha: μ < 30 mpg x = 27 se = 4 mpg Reject H0 if x < 30 - (1.64)(4) < 23.44 What is our conclusion? fail to reject H0 Questions? http://i.imgur.com/aliTlT3.jpg Hypothesis Tests If you reject H0 with an αlevel of 0.05, we also say our x value is “significant at the .05 level” or we say we found a “significant difference” Hypothesis Tests We can make our x more likely to be significant by (as usual): TAKING A LARGER SAMPLE SIZE Hypothesis Tests Because we can “cheat the system” by taking a huge sample size that will find any teeny, tiny difference to be significant, we have a backup plan Hypothesis Tests We also set levels of “practical significance” - what numerical difference would convincingly show a significant difference Hypothesis Tests These levels of practical significance come from our knowledge of the variables we are measuring Hypothesis Tests If we had taken a sample of 10,000,000 to calculate our mpg average and se, we could easily have had an se of 0.1 mpg Probably we wouldn’t really think that was a significant difference in mileage Hypothesis Tests A practically significant difference would be the amount in mpg that you would think is different enough from 30 mpg to be important Hypothesis Tests We set a level of practical significance at the same time we set the α-level Hypothesis Tests PROJECT QUESTION What would be your level of practically significant difference for mpg? Questions? http://i.imgur.com/aliTlT3.jpg