Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Taylor's law wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Math 140 Notes and Activity Packet (Word) Hypothesis Testing & Simulation Math 140 Notes and Activity#1 (Card Activity) Introduction to Hypothesis Testing Articles in Newspapers, Magazines or Online make population claims all the time. Often they take a sample (may not even be random) and then tell the world that their sample value is the same as the population value. We saw in our study of sampling variability how bad this is. Even random samples will be different from the population value. With all these erroneous population claims, we may want to test what someone says about a population and see if it is consistent with data. This is called a “Hypothesis Test”. Before getting into the specifics of formal hypothesis testing it is important to understand the main idea of hypothesis testing and the role that simulation plays in this process. Cards and Candy Activity Your teacher will open a brand new deck of cards, shuffle the cards and start allowing students to pick cards. Remember, a deck of cards has 52 cards, half red and half black. Each student gets to pick a card. Your instructor will identify if red or black is a winning color. If you a pick a card with the winning color, you win a piece of candy, but if you pick a card with the losing color you have to sit back down (no candy). Questions to answer about the Cards and Candy Activity 1. Were there any assumptions you made before the activity started. As we started doing the activity, when did you grow suspicious that the assumptions might not be true? (This is like when someone makes a population claim, but when we start seeing data, we grow suspicious that the population claim might not be true) What happened in the activity that gave you an idea that something might be wrong? 2. In hypothesis testing we often have two hypotheses, the “null hypothesis” and the “alternative hypothesis”. What do you think the null and alternative hypothesis might be for the Cards and Candy activity? If our sample data disagrees significantly with the null hypothesis, we will “Reject the null hypothesis” and therefore support the Alternative. 3. If the null hypothesis was true, what was the probability of getting 5 of the same color in a row? We couldn’t put our finger on it, but this probability is part of what was bothering us. If the null hypothesis is true, and the chances of sample data happening is very small. It gives us the idea that the null hypothesis might not be true. This probability of getting sample data if we assume the null hypothesis is true is called a “P-value”. What was the approximate P-value for the Cards and Candy activity? Our P-value has to be very low to bother us. People in statistics often like the P-value to be less than 5% to be a significant disagreement with the null hypothesis. Was our P-value lower than 5%? 4. We now want to write a conclusion about what we think is true about the cards. What does the cards drawn and the probability of it happening (P-value) tell us about the cards? 5. Here are some important questions to consider in Hypothesis testing. a) If the null hypothesis is true, could the sample data of happened by random chance (sampling variability)? (If the answer is no or it is extremely unlikely, then the null hypothesis may be wrong.) (If the answer is yes or likely, then the null hypothesis may be correct). Apply this question to the Cards and Candy activity. b) There is a difference between “convincing evidence” and “proof”. What do you think the difference is between these? In the Cards and Candy activity, which one did we have? Was it possible that we might have got it wrong? c) You should always think about the ramifications of your hypothesis test conclusion being wrong. Sampling variability is difficult to predict and many educated Statisticians have made wrong conclusions about populations. In the Card and Candy activity, what were the ramifications of getting the conclusion wrong? Hypothesis Test Notes Finding the Null and Alternative Hypothesis Null Hypothesis : H 0 Alternative Hypothesis : H A or H1 These are competing ideas about the population. Hypothesis Test: Procedure for checking what someone has said about the population. (i.e. checking the “claim”) Claim: What the person actually said about the population. Steps for finding the Null and Alternative Hypothesis 1. Write down the claim (what person said) in symbolic language. Write the word “claim” next to it. 2. Write down the opposite of the claim (opposing view) in symbolic language 3. The statement that has or or is the null hypothesis. Put an H 0 next to it. 4. The statement that has or < or > is the alternative hypothesis. Put an H A next to it. Symbols for population parameters: (population mean) p (population percentage) (population standard deviation) Example 1: A mechanic claims that the (mean) average weight of all transmissions is more than 52 kg. Note: The sample data is never part of the null or alternative hypotheses. Only population statements. Write down the claim (what person said) in symbolic language. Write the word “claim” next to it. 52 (claim) Write down the opposite of the claim (opposing view) in symbolic language 52 The statement that has or or is the null hypothesis. Put an H 0 next to it. The statement that has or < or > is the alternative hypothesis. Put an H A next to it. H A : 52 (claim) H 0 : 52 Example 2: The FDA says that at least 2.5% of people that take this medicine will have serious side effects. Write down the claim (what person said) in symbolic language. Write the word “claim” next to it. p 0.025 (Claim) (Note 2.5% = 0.025) Write down the opposite of the claim (opposing view) in symbolic language p 0.025 (claim) p 0.025 The statement that has or or is the null hypothesis. Put an H 0 next to it. The statement that has or < or > is the alternative hypothesis. Put an H A next to it. H 0 : p 0.025 (claim) H A : p 0.025 Example 3: The school board claims that the average SAT score for female high school students is the same as the average SAT score for male high school students. 1 : female 2 : male H 0 : 1 2 (claim) H A : 1 2 Notes: There are 3 types of Hypothesis Tests. Right Tailed Test (Alternative is >) Left Tailed Test (Alternative is <) Two Tailed Test (Alternative is ) Look at the three previous examples. What type of test is being used? Example 1 H A : 52 (claim) H 0 : 52 Type of Test? Right Tail Test Example 2 H 0 : p 0.025 (claim) H A : p 0.025 Type of Test? Left Tailed Test Example 3 H 0 : 1 2 (claim) H A : 1 2 Type of Test? Two Tailed Test Note: Sometimes you will see the null written as “=” Even if technically it is or . Will accept either answer. Example 1 H A : 52 (claim) H 0 : 52 OR H A : 52 (claim) H 0 : 52 Math 140 Activity #2 Null and Alternative Hypothesis For each of the following problems: a) Write the null and alternative hypothesis. b) Label whether the null or the alternative is the original claim. c) Tell whether this is a left tail test, a right tail test, or a two tail test. 1. According to a CNN report, besides cell phones, 93% of Americans also own a traditional phone. But has that percentage decreased as more and more Americans opt to only use a cell phone and throw away their traditional phones? 2. According to a recent Newspaper article, people in California spend 1.25 hours a day eating and drinking. Suppose we want to test the claim that the number of hours spent eating and drinking is really 1.25 hours. 3. More and more Americans are becoming financially sound and opting to not own a credit card. According to an article in USA Today, 74% of Americans still have at least one credit card. But this claim seems a little on the low side. We think that more than 74% of Americans own a credit card. 4. It has long been thought that normal body temperature is really 98.6 degrees Fahrenheit. A recent study is now claiming that normal body temperature is really lower than 98.6 degrees. 5. The standard deviation for the heights of men was thought to be 2.9 inches. New studies disagree with this. Test the claim that the standard deviation for heights of men is not 2.9 inches. 6. Wikipedia suggests that at least 10% of the world population is left handed. Wikipedia may not be very accurate. Test the claim that at least 10% of the world population is left handed. 7. The percent of women that hold CEO level jobs is lower than the percent of men that hold CEO level jobs. 8. The average cholesterol level for American men and women is about the same. 9. The majority of Republicans support decreasing taxes. Hypothesis Test Notes Test Statistics The ability to “test” what someone says about a population is largely dependent on being about to tell if the random sample value was significantly different than the population value. In other words does the random sample value significantly disagree with what the person said about the population? That is a problem. The answer to this is it sometimes impossible to tell with your own eyes. Let us suppose that the population percentage is 0.25 (25%) and the sample value is 0.22 (22%). Is the sample significantly different? We don’t know. Sometimes 3% is a significant difference and sometimes 3% is not significant. So how do we tell if our sample data significantly disagrees with a population value? The answer to this is we need to measure the difference in a very special way. Knowing how many miles different or percentage points different is not going to help us. We need to know how many “standard errors” different they are. This is called a Test Statistic. Example 1 Let’s look at the percentage problem where the population percentage is 0.25 (25%) and the sample value is 0.22 (22%). We know the sample value is % lower, but we do not know if that is significant. Another important bit of information is the sample size. In this case it was 100. To find this out calculate the test statistic. Formulas for Test Statistics follow a general pattern. Remember a test statistic counts how many standard errors that the sample value is above or below the population value. So the formula looks like the following: Sample Value Population Value Standard Error Remember in the last unit we saw that statisticians often used formulas to approximate the standard error. Here is the formula for the test statistic when comparing a sample percentage p̂ and population percentages ( p ). Recall that the number of standard deviations is often represented with at Z-score or T-score, so it is not surprising that the test statistic is often a T or Z. Also remember that z-scores and t-scores are often rounded to the hundredths place. z ( pˆ p) p 1 p n Let’s plug in our numbers. Remember that the sample percentage pˆ 0.22 and population percentages p = 0.25 and the sample size n = 100. z ( pˆ p) p 1 p n (0.22 0.25) 0.25 1 0.25 100 (0.22 0.25) 0.25 0.75 100 0.03 0.69 0.0433 So is it significant? We learned in the last unit that for a z-score to be significant it should be around 2 higher or -2 or lower. (Recall the 3 famous z-scores of 1.645, 1.96 and 2.576). So if we want to be 95% confident, we should have a z-score of around 2. To be 90% confident we need about 1.6 and for 99% confident we need about 2.5. These are just general guidelines. We will get a whole lot more accurate when we talk about P-value and significance levels. For now use 2 and -2 as your guideline. Our z-score was about -0.69. So our sample percentage (22%) was only 0.69 standard errors below the population percentage 25%. Hence it is not significant. Example 2 Sample size plays a key role in significance. Let’s look at the same example but with a sample size of 1000. Remember that the sample percentage pˆ 0.22 and population percentage p = 0.25 Let’s calculate the test statistic for this one. z ( pˆ p) p 1 p n (0.22 0.25) 0.25 1 0.25 1000 (0.22 0.25) 0.25 0.75 1000 0.03 2.19 0.013693 Notice now our sample value of 22% is significantly lower than our population value of 25%. In fact, our sample value of 22% is 2.19 standard errors below the population percentage of 25%. This is a significant since our z-score is less than -2 (more than 2 standard errors away). Example 3 What do we do if we want to know if a sample mean is significantly different from a population mean? Sometimes 13 pounds is a lot and sometimes 13 pounds is very little. An article in a health magazine claims that the mean average weight of all men is about 175 pounds. A random sample of 60 men found that the sample mean was 188 pounds with a standard deviation of 96 pounds. So the sample mean 188 is 13 pounds heavier than the population mean of 175. Is that significant? The answer again is we don’t know. We would need calculate a test statistic to see if 13 pounds is a lot in this situation. Here is the formula for calculating test statistics to compare sample and population means. Notice it follows the same general pattern and seeks to count how many standard errors the sample value is above or below the population value. Notice we label the test statistic as a T-score since again it is the number of standard deviations (errors) and the Tscore is more accurate than the Z-scores for small quantitative data sets. T Sample Value Population Value x Standard Error s n Now use the formula to calculate the test statistic. Remember the sample mean x-bar is 188, the population mean mu is 175, the standard deviation s is 96 and sample size n is 60. T x 188 175 s n 96 60 13 1.0489 1.05 12.39355 Notice our sample mean of 188 pounds is not a significant disagreement with population mean of 175 pounds. Our sample mean of 188 pounds is only 1.05 standard errors above the population mean of 175. Remember it needs to be close to 2 or higher to be significant. Two Important Notes: It is important to understand how test statistics work and what the formulas mean. It is not important to calculate these by hand. Statistics programs like StatCrunch can calculate the test statistic in half a second with much better accuracy. It is important that you can explain the meaning of the test statistic and what it tells us about significance. Test Statistics can sometimes be borderline significant which makes them hard to interpret. Think of a z-score test statistic of 1.90. Is that significant? It is close to 2 standard errors away, but is it close enough to be considered significant? Sometimes the answer to this is yes and sometimes it is no. In the past, statisticians would look up a critical value to compare the test statistic to so that they can know if it was significant. Critical values are difficult to work with because they change for every situation. We will see that P-value is a much better way to decide significance, especially in borderline cases. Math 140 Activity #3 Calculating and Interpreting Test Statistics 1. Wikipedia claims that 10% of people are left handed. A sample of 250 randomly selected adults found that 32 of them were left handed. a) Use the formulas pˆ x and z n pˆ p p(1 p) n to find the z-score test statistic. b) Write a sentence to interpret the meaning of the test statistic. c) Do the sample values seem to be significantly different than the claim? 2. According to a CNN report, 93% of Americans also own a traditional phone (not a cell phone). We took a sample of 850 randomly selected Americans and found that 785 of them own a traditional phone. a) Use the formulas pˆ x and z n pˆ p p(1 p) n to find the z-score test statistic. b) Write a sentence to interpret the meaning of the test statistic. c) Do the sample values seem to be significantly different than the claim? 3. Normal body temperature has long thought to be 98.6F . A random sample of 50 randomly selected adults was found to have a mean average temperature of 98.2F with a standard deviation of 0.765F . a) Use the formula t x s n to find the t-score test statistic. (Note: A t-score is just like a z-score. It measures the number of standard deviations the sample value is from the population value.) b) Write a sentence to interpret the meaning of the test statistic. c) Do the sample values seem to be significantly different than the claim? 4. The average height for men has long thought to be 69.2 inches. A random sample of 275 randomly selected adult men was found to have a mean average height of 69.5 inches with a standard deviation of 2.7 inches. a) Use the formula t x s n to find the t-score test statistic. (Note: A t-score is just like a z-score. It measures the number of standard deviations the sample value is from the population value.) b) Write a sentence to interpret the meaning of the test statistic. c) Do the sample values seem to be significantly different than the claim? Hypothesis Test Activity 4 Randomized Simulation - Simulating the Null Hypothesis Notes on Simulation As we discussed yesterday, the key to a hypothesis testing is to see if your random sample data significantly disagrees with the population value being tested. This is difficult to do because of sampling variability. Random samples almost always give different values and will be different than the population value being tested most of the time. The key question in hypothesis testing is the following: Key Question: Could the sample data be different from the population value just because of sampling variability (random chance)? Or is the sample value so significantly different from the population value, that it causes us to think that the population value may be wrong. (i.e. The sample data is not what we would expect by random chance!) How can we answer this key question we need to simulate a distribution based on the null hypothesis. This is often called a randomized simulation or a “randomization technique”. We need to simulate what a distribution should look like if the population value is the same as the null hypothesis. Then we can compare our random sample data to the distribution to see how likely it is to happen. Note: A “Sampling Distribution” is different than a “Randomized Simulation”. A sampling distribution is taking lots and lots of random samples from a population. We use this to understand sampling variability, and to estimate a population value, standard error and confidence intervals. A sampling distribution is not based on the null hypothesis. It is just lots and lots of samples taken from a population. A randomization simulation is simulating the null hypothesis. Eventually we will compare one random sample to the simulation. A simulation assumes that the null hypothesis is true and is totally based on the null hypothesis! It is not designed to estimate a population value, but to test what someone has said about the population value. Directions: We will now look at the following problems and use StatKey at www.lock5stat.com. Look on the right side of the screen where it says “Randomization Hypothesis Tests” and click on the appropriate link (single mean, single proportion, difference of means, difference of proportions). We will be focusing on single mean and single proportion in this activity. 1. Normal body temperature has long thought to be 98.6F . Many scientists now think that normal body temperature may be lower than 98.6F . We want to test if sample evidence supports the scientists claim. Let’s look at a random sample of body temperatures from 50 adults. On the lock5 website, Click on “randomization hypothesis tests” and then “single mean”. The 50 body temperatures have already been entered. If you do not see them there, click on “custom data set” and then “body temperature”. a) What is the null and alternative hypothesis? Which one is the claim? b) List the shape, sample size, mean and standard deviation for the “original sample” (actual data not simulated. ) c) We want to simulate a distribution under the assumption that the population value really is 98.6F . Make sure that 98.6 and click on “generate 1 sample” to simulate the null hypothesis. It is easy to get lost. Remember the “original sample” is our actual sample data we listed in part (b). The “randomized sample” is a simulated (made up) data set of the same size 50 from a population with a mean of 98.6F . What is the mean and standard deviation of the first randomized simulation? How far is the first simulated mean from 98.6? d) Now click on generate 1000 samples. Let’s see if we understand what we are looking at. Again, these are not actual samples from a population. Each dot represents the mean of a simulated data set of size 50. You now have 1001 samples and have created a randomization distribution. In a sense, we have predicted how we expect data sets from a population with a mean of 98.6F to behave. What is the shape, center (mean) and spread (estimated standard error) from the distribution? e) Our goal was to know if getting a sample mean of 98.26F was something that could happen by random chance from a population with a population mean of 98.6F ? Look at the distribution. Since this was a left tail test, let’s look at how many dots were 98.26F . Here is an important question. If we are wondering if 98.26F is significantly lower than the population value 98.6F , wouldn’t dots that have a temperature lower than the sample value 98.26F also cause us to doubt the validity of the population value 98.6F ? Of course. So we don’t want to just count how many dots are exactly 98.26F , but how many dots are that or lower. (Left Tail) This is an important idea in Statistics. Click on the button that says “left tail”. In the box at the bottom of the distribution, type in “ 98.26 ”. How many dots were lower than 98.26? What percent of the distribution was lower than 98.26? This percent is called a “P-value”. P-value = The probability of getting the sample data or more extreme, if the null hypothesis is really true. The randomized simulation has helped us flush out what we expect to happen if the null hypothesis was really true. f) Decision time. In a simulation of samples from a population with a mean of 98.6F , would a sample mean of 98.26F be likely to happen by random chance? What does this tell us about the validity of the “so-called” population value for normal body temperature 98.6F ? Do you still agree with the population value? How about the scientists that said they think it is lower. Do you have any evidence that supports their claim? Do you have convincing evidence to back up your opinion? Do you have proof? 2. Let’s repeat the activity in #1 with another data set. An article claims that the mean average price of houses in New York is greater than 265 Thousand Dollars. To test this claim, we took a random sample of 30 homes in New York. We listed the prices in thousands of dollars. The sample data is already in the lock5stat website. Again, go to “Randomization Hypothesis Test” and “Test for Single Mean”. On the upper left, click on the button that says “body temperature” and change it to “Home Prices – NY”. We are going to use a randomized simulation to test this population value of 265 Thousand Dollars. We want to simulate what data sets that have a population value of 265 would look like. Click on “generate 1 sample” and then “generate 1000 samples” and answer the following questions. a) What is the null and alternative hypothesis? Which one is the claim? b) List the shape, sample size, mean and standard deviation for the “original sample” (actual data not simulated. ) Is the original sample mean lower or higher than the population value? c) What is the mean and standard deviation of the first randomized simulation? How far is the first simulated mean from 265? d) Now click on generate 1000 samples. Let’s see if we understand what we are looking at. Again, these are not actual samples from a population. Each dot represents the mean of a simulated data set of size 30. You now have 1001 samples and have created a randomization distribution. In a sense, we have predicted how we expect data sets from a population with a mean of 265 to behave. What is the shape, center (mean) and spread (estimated standard error) from the distribution? e) Our goal was to know if getting a sample mean of 565.633 was something that could happen by random chance from a population with a population mean of 265? Look at the distribution. Since this was a right tail test, let’s look at how many dots were 565.633. Again, if we are wondering if 565.633 is significantly higher than the population value 265, wouldn’t dots that have a value greater than 565.633 also cause us to doubt the validity of the population value 265? Of course. So we don’t want to just count how many dots are exactly 565.633, but how many dots are that or higher. (Right Tail) Click on the button that says “right tail”. In the box at the bottom of the distribution, type in “ 565.633 ”. How many dots were higher than 565.633? What percent of the distribution was higher than 565.633? This percent is called a “P-value”. P-value = The probability of getting the sample data or more extreme, if the null hypothesis is really true. The randomized simulation has helped us flush out what we expect to happen if the null hypothesis was really true. f) Decision time. In a simulation of samples of size 30 from a population with a mean of 265, would a sample mean of 565.633 be likely to happen by random chance? What does this tell us about the validity of the “so-called” population value of 265 thousand dollars? Do you still agree that 265 thousand dollars is the mean average price of all homes in New York? What about the article that said that the actual population value is greater than 265 thousand dollars? Do you agree with the article? Do you have convincing evidence to back up your opinion? Do you have proof? Now let’s look at a simulation involving checking a single population percentage (proportion). People make claims about population percentages all the time. Now we have a way to check their claims. For #3 go to the lock5stat website and under the “Randomization Hypothesis Tests” click on “Test for Single Proportion”. 3. In the last election, we were wondering if president Obama would be re-elected. We took a random poll of 1057 Americans and asked them if they would vote for Obama to be re-elected. Of the 1057 people in the poll, 583 said they would support Obama. Is this evidence convincing enough for us to know if more than 50% of all Americans would vote to re-elect president Obama? To answer this question we will look at simulations from a population with mean average percent of 0.5, create a distribution, and then see how it behaves. If you go to the top left corner you can click on the link for “election poll support Obama” and the numbers will be automatically entered for you. a) What is the null and alternative hypothesis? Which one is the claim? b) What was the original sample percent in the poll? Is the original sample percent lower or higher than the population value 0.5? c) Click on generate 1 sample. The computer has simulated talking to 1057 people when the population percent is 50% (0.5). How many people said they support Obama in the first simulation? What was the simulated percent? d) Now click on generate 1000 samples. Let’s see if we understand what we are looking at. Again, these are not actual samples from a population. Each dot represents the percent of a simulated data set of size 1057. You now have 1001 samples and have created a randomization distribution. In a sense, we have predicted how we expect data sets from a population with a percentage of exactly 50%. What is the shape, center (mean) and spread (estimated standard error) from the distribution? e) Our goal was to know if getting a sample percent of 0.552 was something that could happen by random chance from a population with a population percent of 0.5? Look at the distribution. Since this was a right tail test, let’s look at how many dots were 0.5. Again, if we are wondering if 0.552 is significantly higher than the population value 0.5, wouldn’t dots that have a value greater than 0.552 also cause us to doubt the validity of the population value 0.5? Of course. So we don’t want to just count how many dots are exactly 0.552, but how many dots are that or higher. (Right Tail) Click on the button that says “right tail”. In the box at the bottom of the distribution, type in “ 0.552 ”. How many dots were greater than or equal to 0.552? What percent of the simulated distribution was higher than 0.552? This percent is called a “P-value”. P-value = The probability of getting the sample data or more extreme, if the null hypothesis is really true. The randomized simulation has helped us flush out what we expect to happen if the null hypothesis was really true. f) Decision time. In a simulation of samples of size 1057 from a population with a population proportion of 0.5 , would a sample mean of 0.552 be likely to happen by random chance? What does this tell us about the validity of the “so-called” population value of 0.5 (50%)? Do you still agree that 50% of people will vote for Obama? Obama’s campaign managers are worried. Do we have convincing evidence that more than 50% of Americans will vote for Obama. (This is the same as asking if we have convincing evidence that Obama will win the re-election.) Do you have proof? (Obama of course did win the re-election.) Hypothesis Test Notes P-value and Significance Levels Hypothesis Test: Using Random Sample data to decide whether a population claim seems reasonable or if it looks very wrong. Problem: Sampling Variability!! Random samples are usually different and random samples are usually very different than the population value. There are two possibilities and we will need to decide which one we think is correct. Option 1: Is our random sample data different than the population value because all random samples are different (random chance)? In which case the population value might be correct. OR Option 2: Is our random sample data different than the population value because the population value is wrong. Important Note: Think of random chance (sampling variability) as a confounding variable. In order to show that the population value in H 0 is wrong (option 2), we have to rule out random chance (sampling variability). In other words we have to make sure option 1 is not correct (or at least highly unlikely), to be able to say that population value is probably wrong (option 2). P-value to the rescue!! P-value can help us decide between the two options. Definition P-value : The probability of getting the sample data or more extreme by random chance if the null hypothesis is true. Probability & Logic principle: If the probability of an event happening is very low, but the event keeps happening, then we should look for a different explanation. Our assumption about how that event works might be wrong. Assumption: Suppose the population value is correct (in null hypothesis H 0 ). The P-value calculates the probability of getting the sample data by random chance based on that assumption. Low P-value If the P-value is very low, then the sample data probably did not happen by random chance. This means option 1 is very unlikely. So probably option 2 is true and our assumption about population value being correct in the null hypothesis is probably wrong. When that happens we say we “Reject the Null Hypothesis”. High P-value If the P-value is high, then the sample data could have happened by random chance. The sample data is probably different because of sampling variability (all random samples are different). Option 1 could be true, meaning we cannot rule out random chance (sampling variability) as a confounding variable. So we will not be able to tell if the population value is wrong. Since it is likely that option 1 occurred, the population value in the null hypothesis might be correct. Since we can’t tell if the population value is right or wrong, we say we “Fail to Reject the Null Hypothesis”. Important Notes: Failing to reject H 0 does not mean that H 0 is true!! It means we cannot tell if the population value is right or wrong. Sampling Variability has struck again. A low P-value occurs when the sample value significantly disagrees with the population value in the null hypothesis. In other words a low P-value corresponds with a large test statistic. Both mean that the population value is probably wrong. A high P-value occurs when the sample value is pretty close to the population value in the null hypothesis. In other words a high P-value corresponds with a small test statistic. Both mean that population value might be correct. Significance Levels Sometimes a P-value might be border line. Remember we want the P-value to be low to insure that the sample data was unlikely to occur by random chance. But how low do we need it? Significance Levels (also called alpha levels) (Greek Letter Alpha) Significance levels ( ) are a number we can compare the P-value too. We will also see later they are also associated with avoiding certain types of errors in statistics. Remember confidence levels? Significance levels ( ) are the opposite of confidence levels ( 1 ). If you want to be 95% confident for example the significance level would be 100%-95% = 5%. This is the most common significance level used. Common Confidence Levels and Significance Levels. Confidence Level Significance Level 90% (0.90) 10% (0.10) 95% (0.95) 5% (0.05) 99% (0.99) 1% (0.01) So before you do your hypothesis test you should choose which significance level you want to use. If you are unsure, use 5% as this is the most common. Summarize: If the P-value ≤ significance level, Reject the null hypothesis. If the P-value > significance level, Fail to reject the null hypothesis Take a look at top half of the “P-value Diagram”. (Can be found on the website on the hypothesis test page.) Summary: Low P-value Sample data significantly different than the population value Sample data probably did not happen by random chance (sampling variability) Reject H 0 High P-value Sample data close to the population value (not significantly different) Sample data could of happened by random chance (sampling variability) Fail to Reject H 0 (Does not mean H 0 is true) Example 1 (Write the null and alternative hypothesis and interpret what we can learn from the given p-value. Assume the problem meets assumptions and use a 5% significance level) An article on line said that the average typing speed for all adults is about 40 (words per minute). We took a large random sample in order to test this claim. Our sample mean was 38 (words per minute) and our P-value was 0.216 H 0 : 40 (claim) H A : 40 P-value 0.216 > 0.05 sig level Fail to reject H 0 (Average typing speed might be 40 might be correct but we are not sure) Sample mean was not significantly different than population value of 40. The random sample mean of 38 (wpm) could have happened by random chance. Example 2 (Write the null and alternative hypothesis and interpret what we can learn from the given p-value. Assume the problem meets assumptions and use a 1% significance level) A pharmaceutical company is developing a new medicine to to help people with diabetes. They want to see if the medicine will help at least 50% of people that take it. They took a random sample of people taking the medicine. They got a P-value of 8.74 104 . H 0 : p 50% (claim) H A : p 50% Notice the P-value is in scientific notation. 8.74 104 = 0.000874 P-value 0.000874 < sig level 0.01 Reject H 0 It is probably not true that the medicine helps at least 50% of people that take it. Data supports that it is less than 50% Sample data was significantly different than population percent (50%). The random sample data is very unlikely to occur by random chance. Go over the P-value Diagram Hyp Test Notes (PDF online only) and the P-value table below Hypothesis Test Notes P-Value, Test Statistic & Simulation Summary Table Large Test Statistic (more than 2 standard errors) Small Test Statistic (about 1 standard error or less) OR OR Small P-value (close to zero) Large P-value (over 10%) OR OR Sample Value in Tail (when simulating Ho) Sample Value not in Tail (when simulating Ho) Is sample data significant? Significant Not Significant Could the sample data happen by random chance? very Unlikely Could happen Reject Ho or Fail to Reject Ho? Reject Ho Fail to Reject Ho Is there Evidence? Yes. Evidence No evidence Math 140 Activity#5 Exploring the Meaning of P-value For each of the following problems: a) Write the null and alternative hypothesis. b) Use the p-value and the significance level to decide whether we should reject the null hypothesis or fail to reject the null hypothesis. c) Write a detailed sentence describing the true meaning of the p-value in the context of the problem. d) Was the sample data significantly different than the population value? e) How likely is it that the sample data happened by random chance? 1. According to a CNN report, besides cell phones, 93% of Americans also own a traditional phone. But has that percentage decreased as more and more Americans opt to only use a cell phone and throw away their traditional phones? A random sample of 500 Americans was taken and 454 of them owned a traditional phone. The p-value was found to be 0.0269. Use a 5% significance level. 2. According to a recent Newspaper article, people in California spend 1.25 hours a day eating and drinking. Suppose we want to test the claim that the number of hours spent eating and drinking is really 1.25 hours. In order to do this, we take a random sample of 400 people in California. The average number of hours for the sample was 1.22 and a p-value of 0.248 was found. Use a 10% significance level. 3. More and more Americans are becoming financially sound and opting to not own a credit card. According to an article in USA Today, 74% of Americans still have at least one credit card. But this claim seems a little on the low side. In order to verify the claim that more than 74% of Americans have a credit card, a random sample of 900 Americans was taken and 76% of them owned a credit card and a p-value of 0.0857 was found. Use a 5% significance level. 4. It has long been thought that normal body temperature is really 98.6 degrees Fahrenheit. A recent study is now claiming that normal body temperature is really lower than 98.6 degrees. A random sample of 10,000 adults worldwide was conducted and the average temperature was 98.2 degrees with a p-value of 0.0023 was found. Use a 1% significance level. Go over Conclusion Notes (PDF online only) Math 140 Activity#6 Writing Conclusions for Hypothesis Tests Directions: For each of the following claims: a) Find the null and alternative hypothesis. b) If the null hypothesis was rejected, write a detailed conclusion statement. c) If we failed to reject the null hypothesis, then write a detailed conclusion statement. (Note: You will be writing two detailed conclusions for each problem.) 1. “The hospital claims that less than 4% of people who received the medication showed symptoms of side effects.” 2. “We think that the average height of women is more than 63.5 inches.” 3. “Latest polls show that the republican candidate should receive about 54% of the vote.” 4. “The average electrically powered car weighs less than 2000 pounds.” 5. “The medication Toprol is showing real promise in treating migraines. The majority (more than 50%) of patients taking Toprol have seen an improvement in their Migraine symptoms.” Math 140 Hypothesis Test Notes 1 population mean and proportion (percentage) Steps for any hypothesis test Null /Alt hypothesis Check Assumptions Put sample data into StatCrunch (P-value and Test Statistic) Reject Ho or Fail to reject Ho Conclusion StatCrunch - 1 population percentage (proportion) Hypothesis Test Stat => Proportion-stat => 1 sample => “with data” or “with summary” => Hypothesis Test Hypothesized proportion (percentage)? p ??? (the population # in Ho and Ha) Alternative Hypothesis? , , (left tail, right tail or two tail) StatCrunch-1 population mean average hypothesis test Stat => T-stat => 1 sample => “with data” or “with summary” => Hypothesis Test Hypothesized mean ??? (the population # in Ho and Ha) Alternative Hypothesis? , , (left tail, right tail or two tail) “Histogram with mean marker” Assumptions for 1 population proportion (clarification) Confidence Interval Assumptions (We don’t know the population percentage p so we have to use the sample percentage p̂ ) (Random and at least 10 successes and at least 10 failures in sample data) n pˆ x 10 n (1 pˆ ) n x 10 Hypothesis Test Assumptions (Someone has made a guess at the population percentage p. This formula is also useful before you collect the data to see if you are likely to get 10 successes and 10 failures. If you are not likely to get 10, collect more data.) (Random and at least 10 expected successes and at least 10 expected failures) n p 10 n (1 p) 10 Significance Levels: (# to compare the P-value with – Is my P-value low enough?) 1% , 5% , 10% Most Common: 5% If the P-value is lower than the significance level, then the sample data was significantly different and significantly disagrees with the null hypothesis If the P-value is higher than the significance level, then the sample data was not significantly different and does not significantly disagrees with the null hypothesis 1 Population Proportion Example Ex) A doctor thinks that the percent of people in a small rural community that have a certain infection is about 6%. He took some random sample data. He interviewed 175 people and found that 13 of them had the infection. Test the doctor’s claim that exactly 6% have the infection. (Use a 5% significance level.) Steps for any hypothesis test Null /Alt hypothesis Check Assumptions Put sample data into StatCrunch (P-value and Test Statistic) Reject Ho or Fail to reject Ho Conclusion H 0 : p 0.06 (claim) H A : p 0.06 Type of hypothesis test? 1 population proportion test (two tail) Assumptions? Notice that the sample data was random and had 13 success and 175-13 = 162 failures, so it does meet the assumptions. Here is the expected successes and expected failures based on a population percentage of 6% and a sample size of 175. Notice the expected success were barely over 10. Would of advised the doctor to collect more data to make sure we get at least 10 successes. N x p = 175 x 0.06 = 10.5 N x (1-p) = 175 x(1-0.06) = 175 x 0.94 = 164.5 StatCrunch - 1 population percentage (proportion) Hypothesis Test Stat => Proportion-stat => 1 sample => “with data” or “with summary” => Hypothesis Test Hypothesized proportion (percentage)? p ??? (the population # in Ho and Ha) Alternative Hypothesis? , , (left tail, right tail or two tail) Hypothesis test results: (from StatCrunch) Proportion Count Total Sample Prop. p 13 175 0.074285714 Std. Err. 0.017952318 Z-Stat P-value 0.7957 0.4262 Test stat z = 0.796 Test Stat Sentence: The sample percent 7.4% was only 0.796 standard errors above the population value 6%. Z - Test Statistic is very small (Needs to be around 2 or higher) This tells us that the sample value (7.4%) was close to the population value (6%) Not a Significant difference P-value = 0.426 P-Value Sentence: If Ho is true, and the population percent really is 6%, we had a 42.6% probability of getting the sample percentage of 7.4% or more extreme by random chance. P-value was very large This tells us that the sample data could have happened by random chance (sampling variability). Sample value of 7.4% is not significantly different than the population value of 6%. There is not a significant disagreement between the sample data and the null hypothesis Pvalue (0.426) > sig level 0.05 Fail to reject Ho Conclusion? There is not significant sample evidence to reject the claim that 6% of the population have the infection. (The doctor might be correct, but we don’t have any evidence) 1 Population Mean Example Ex 2. Test the claim that Math 140 students at COC work less than 25 hours per week on average? (Use a 10% significance level) (Our class made this claim.) (Use the Fall 2015 math 140 survey data) Steps for any hypothesis test Null /Alt hypothesis Check Assumptions Put sample data into StatCrunch (P-value and Test Statistic) Reject Ho or Fail to reject Ho Conclusion H A : 25 (claim) H 0 : 25 1 population mean test (Left tailed test) Population Mean Average Assumptions? Random? Sample size at least 30 or bell shaped? The sample was not random, but it was an incomplete census so we can assume it represents Math 140 students relatively well. The data was skewed (not bell shaped), but because the sample size was 331 (over 30) it does meet the greater than 30 OR normal criteria. StatCrunch-1 population mean average hypothesis test Stat => T-stat => 1 sample => “with data” or “with summary” => Hypothesis Test Hypothesized mean ??? (the population # in Ho and Ha) Alternative Hypothesis? , , (left tail, right tail or two tail) “Histogram with mean marker” Hypothesis test results: (from StatCrunch) Variable Sample Mean Hours Work 21.265861 Std. Err. DF T-Stat P-value 0.8851419 330 -4.2187 <0.0001 Sig level = 10% or 0.1 Test Stat T = -4.219 Test Stat Sentence: The sample mean was 4.219 standard errors below the population mean of 25 hours. 4.2 is a lot for a z-score or t-score. (way over 2) There was a significant difference between sample mean 21.3 hours and population mean 25 hours.) P-value = 0 (< 0.0001) P-Value Sentence: If Ho is true and Math 140 students work exactly 25 hours, then there was about 0 Probability of getting the sample data or more extreme by random chance. Data did not happen by random chance. (sampling variability) Sample data is significantly different than population value. Sample data significantly disagrees with the null hypothesis Pvalue (0) < sig level (0.1 or 10%) Reject the Ho Conclusion: There is significant sample evidence to support the claim that Math 140 students at COC work less than 25 hours. Math 140 Activity#7 Hypothesis Tests for One Population Means Directions: Use Statcrunch to perform the hypothesis tests. Make sure to give the null and alternative hypothesis, check the assumptions, give the t test statistic and p-value, state whether or not you reject the null hypothesis, and a conclusion. Write a sentence explaining the meaning of the test statistic and state whether the sample value was significantly different than the population value or not. Write a sentence explaining the meaning of the p-value and determine if the sample data was likely to happen by random chance or not. 1. The manager at a local Starbucks wants to make sure that customers wait less than 4 minutes from the time they order to the time that they pick up their coffee. In order to test this, twenty random customers were selected and the staff measured the number of minute between when the person ordered and when their drink was ready. The sample mean was 2.870 minutes and the sample standard deviation was 1.379 minutes. Here is a histogram of the twenty wait times. Does this data meet the assumptions necessary to perform a hypothesis test? If so, use a 1% significance level to test the claim that the average wait time is less than 4 minutes. Histogram of C1 5 Frequency 4 3 2 1 0 0 1 2 3 C1 4 5 6 2. Redwood trees are the tallest plants on Earth. California is famous for its giant Redwood trees. But just how tall are they? A random sample of 47 California Redwood trees was taken and their heights measured. (This was not easy by the way.) The sample mean average height was 248 feet with a standard deviation of 26 feet. Does this data meet the assumptions necessary to perform a hypothesis test? If so, use a 5% significance level to test the claim that Redwood trees have an average height greater than 240 feet. 3. Maria is planning to attend UCLA. She is curious what the average age of UCLA students is. Since most students that attend UCLA are in their 20’s yet there are also students up to 70 years old, the population is positively skewed. The college conducted a random sample of 65 students and found that the sample mean was 29.0 years old with a standard deviation of 5.2 years. Does this data meet the assumptions necessary to perform a hypothesis test? If so, use a 10% significance level to test the claim that the average age of students at UCLA is 30 years old. 4. Mike wants to know the average price of a hamburger. So he randomly selects 24 randomly selected restaurants and records the price of a regular hamburger. The sample mean price was $3.88 and the sample standard deviation was $1.14. A histogram of the data is below. Does this data set meet the assumptions necessary to perform a hypothesis test? If so, use a 10% significance level to test the claim that the average price of a hamburger is greater than $3.50? Histogram of C1 9 8 Frequency 7 6 5 4 3 2 1 0 $2.00 $3.00 $4.00 C1 $5.00 $6.00 For #5-7, use the Math 140 Survey data from Fall 2015 and StatCrunch to perform the following hypothesis tests. Use a 5% significance level for all of the problems. Make sure to make a histogram of the data and check if the data set meets the assumptions necessary to perform the hypothesis test. 5. Test the claim that the average age of math140 students is higher than 21 years old. 6. Test the claim that the average weight of math140 students is less than 160 pounds. 7. Test the claim that the average height of math140 students is 64 inches. Math 140 Activity#8 Hypothesis Testing for One Proportion Directions: Use Statcrunch to perform the hypothesis tests. Make sure to give the null and alternative hypothesis, check the assumptions, give the t test statistic and p-value, state whether or not you reject the null hypothesis, and a conclusion. Write a sentence explaining the meaning of the test statistic and state whether the sample value was significantly different than the population value or not. Write a sentence explaining the meaning of the p-value and determine if the sample data was likely to happen by random chance or not. Notes: Use the formulas n p 10 and n 1 p 10 when checking assumptions If the problem only gives the sample percent you will need to calculate the number of successes (x value) using the formula x pˆ n before you can plug into Statcrunch. 1. The United States has the highest teen pregnancy rate in the industrialized world. The Center for Disease control says that as of 2011, 33% of girls get pregnant before the age of 20. We are wondering if the teen pregnancy rate is even higher than what the CDC claims? A random sample of 400 girls is taken. Of the 400 girls randomly selected, 144 of them were pregnant before the age of 20. (Use a 5% significance level.) 2. Campus bookstores have increased the number of digital textbooks this school year, as students weaned on Facebook and iPads seek virtual alternatives to heavy tomes. Digital textbooks are projected to account for approximately 13% of course materials sold by the fall of 2012, compared with just 3 percent of the $5.85 billion sold last year, according to the National Association of College Stores. How accurate is the claim of 13%? Is it too high or too low an estimate? In a random sample of 260 college course materials, 15% of the sample course materials were digital. But is this sample percentage significant enough to contradict the claim that the true population percentage is 13%? (Use a 10% significance level.) 3. Childhood obesity has more than tripled in the past 30 years. The percentage of children aged 6–11 years in the United States who were obese increased from 7% in 1980 to nearly 20% in 2008. If this trend continues we can expect that the percent of young children that are obese in 2012 to be significantly greater than 20%. In order to test this claim, a random sample of 800 children in the U.S. was taken and 179 of them were found to be obese. (Use a 5% significance level.) 4. About 1 in 3 U.S. adults—as estimated 68 million—have high blood pressure1, which increases the risk for heart disease and stroke, leading causes of death in the United States. High blood pressure is called the "silent killer" because it often has no warning signs or symptoms, and many people don't realize they have it. That's why it's important to get your blood pressure checked regularly. But is the rate of U.S. adults really this high? Another web site claims that the true percentage of U.S. adults with high blood pressure is actually dramatically lower than 1 in 3 (33.3%). To test this claim we randomly selected 500 adults across the U.S. and found that 165 of them had high blood pressure. (Use a 1% significance level.) (#5-6) You will be using the “Math 140 Survey Data from Fall 2015 to test the following claims. The Survey was taken in all Math 140 classes, so though it is not random, it is an attempt at a census, so you can assume it represents the population. In Statcrunch, go to Proportion Stats, 1 Sampel, and With Data. 5. Test the claim that more than 50% of Math 140 students are female. You will need to get the gender data and paste it into Statcrunch. Type in “Female” in the Success box. 6. Test the claim that exactly 1/3 (33.3%) of Math 140 students take their class at the Canyon Country Campus. You will need to get the campus data and paste it into Statcrunch. Type in “Canyon Country” in the Success box. Hypothesis Test Notes Type 1 and Type 2 Errors Sampling Variability can sometimes really mess up a hypothesis test. When that happens, there can be severe consequences. Type 1 and Type 2 errors occur when the sample data is not reflective of the population and gives us a wrong view about the population. Type 1 Error (Think the alternative hypothesis H A is correct when it is not.) Rejecting H 0 by mistake Bad random sample gives Low P-value that is not reflective of the population. The person analyzing the data then rejects H 0 and supports H A by mistake when in actuality, the H 0 is correct. Type 2 Error (Think the null hypothesis H 0 is correct when it is not.) Fail to reject H 0 by mistake Bad random sample gives High P-value that is not reflective of the population. The person analyzing the data then fails to rejects H 0 by mistake (thinks H 0 might be correct), when in actuality, the H 0 is wrong and H A is correct. How to stop a Type 1 Error? Lower the significance level!! Significance level (alpha level) is the probability of type 1 error. So to limit the chances of a type 1 error, simply lower the significance level from 5% to 1%. Remember Type 1 and Type 2 are on a see-saw. As one goes up the other goes down. If you decrease the significance level from 5% to 1%, the probability of type 2 error (beta level) will now increase. How to stop a Type 2 Error? Increase the sample size!! (Collect more sample data) Remember Type 1 and Type 2 are on a see-saw. As one goes up the other goes down. If you increase the significance level from 5% to 10%, the probability of type 1 error will increase, but the probability of type 2 error (beta level) will now decrease. Increasing the significance level is sometimes not an option especially when a type 1 error is really bad. So instead of increasing the significance level, increase the sample size. More data results in a more powerful test and a lower probability of type 2 error (decreased beta level). Significance levels and type 1 and type 2 errors 5% significance level (95% confidence level) is a good balance between type 1 and type 2 errors. Both are relatively low. 1% significance level (99% confidence level) will have a lower probability of type 1 error but a higher probability of type 2 error. 10% significance level (90% confidence level) will have a higher probability of type 1 error but a lower probability of type 2 error. Summary of Type 1 and Type 2 Errors Type 1 error is believing that the alternative hypothesis is correct when it is not. Limit the chances of type 1 error? Decrease the significance level (alpha level) Type 2 error is believing that the null hypothesis is correct when it is not. Limit the chances of type 2 error? Increase the sample size Examples When exploring type 1 and type 2 errors, the key is to write down the null and alternative hypothesis and the consequences of believing the null is true and the consequences of believing the alternative is true. Remember above all: Type 1 and Type 2 errors are MISTAKES!! Example A pharmaceutical company wants to sell a new medicine in the U.S. To get approval they need to convince the FDA that the medicine is safe and has few side effects. If serious side effects happen in 4% or more of the people taking the medicine, then the FDA may not approve sale of the medicine in the U.S. If serious side effects happen in less than 4% of people taking the medicine, then the FDA may approve sale of the medicine in the U.S. What is the null and alternative hypothesis? Ho: p ≥ 4% (FDA does not allow medicine to be sold in U.S.) Ha: p < 4% (FDA does allow medicine to be sold in U.S.) Describe the consequences of a type 1 error and what we could do to limit the probability of a type 1 error. Because of some biased sample data, we got a low P-value and rejected the null hypothesis by mistake. So we think that the alternative hypothesis is correct when it is not. That would mean that the FDA approved sale of the medicine by mistake. The medicine causes serious side effects in a lot of people. People could die or become very sick. They may sue the pharmaceutical company or the FDA. To make sure this doesn’t happen, lower the significance level to 1%. Describe the consequences of a type 2 error and what we could do to limit the probability of a type 2 error. Because of some biased sample data, we got a high P-value and failed to reject the null hypothesis by mistake. So we think that the null hypothesis is correct when it is not. That would mean that the FDA blocked the sale of a good medicine that rarely causes any side effects. Patients will be deprived of a good medicine and the company will lose a lot of money in potential profits. To make sure this doesn’t happen, increase the sample size. Math 140 Hypothesis Testing Activity 9 Type I and Type II Errors Directions: For each of the following problems, find the null and alternative hypothesis. Then write a description of a type I error and the consequences of that error in the context of the problem. Then write a description of a type II error and the consequences of that error in the context of the problem. 1. A new medication has been developed to help alleviate the symptoms of stress. In doing sample testing, the company that created the medicine found that it seems to work fine on men, but not so well on women. The FDA does not want to approve sale of the medicine in the U.S. if it is true that the percent of women that the medicine helps is significantly less than the percent of men that the medicine helps. 2. The Acura car company is debating whether to recall its latest sedan because of a malfunction in its airbags. Acura executives think that the defect rate is probably low, but if the airbags malfunction and do not open in 2% or more of crashes, then they will need to put out a general recall. 3. Mike and his advertisement team have created an advertisement plan for a new flavor of Pepsi. Right now, approximately 4% of soda drinkers are purchasing this type of Pepsi. Mike needs to show his bosses that his advertisement plan will increase the percentage of soda drinkers purchasing this new flavor. 4. What could we do to decrease the chances that a type I error occurring? 5. What could we do to decrease the chances that a type II error occurring? 6. If we increase the significance level from 5% to 10% what will happen to the probabilities for type 1 and type II errors? 7. If we decrease the significance level from 5% to 1% what will happen to the probabilities for type 1 and type II errors? 8. What significance level achieves a good balance between type I and type II errors? Hypothesis Test Notes Two Population Tests We sometimes would like to know if one population is larger or smaller than another population. This is a two population hypothesis test. Label which group is population 1 and which is population 2!!! (It does not matter which group you pick to be population 1 or 2, but however you label it, make sure you put the data into StatCrunch in that order!) Key Question????? We are comparing two populations by looking at sample data. Remember, like any hypothesis test, we have to rule out sampling variability (random chance) to be able to reject the null hypothesis. Key Question: Why are my two samples different? Option 1: (Random Chance) The populations are the same, and the samples are different because all random samples are different. Option 2: (Populations are different) The samples are different because the populations are different. In a two population hypothesis test, to determine if populations are different, we first must rule out option 1 (random chance). How can we rule out random chance??? Important Note: You cannot just look at the two sample values. Remember sometimes a 10 pound difference is a lot and sometimes it is not a lot. Sometimes a 3% difference is a lot and sometimes it is not a lot. Test Statistic, P-value, or Simulation to the rescue!! We are able to rule out random chance when the samples are significantly different and the probability of that significant difference happening is very low. Large Test Statistic (T-stat or Z-stat close to +2 or higher or close to -2 or lower.) Low P-value (P-value is close to zero or less than the significance level) Simulate what samples would look like when the populations are the same. (If our sample difference is in the tail, then our sample difference is significant and the probability of that sample difference (P-value) or more extreme is very low.) Setting up your two population hypothesis test Step 1: Label which group is population 1 and which is population 2 and stick to it. For example: Population 1: women Population 2: men Step 2: Null and Alternative Hypothesis (There are various ways of writing the null and alternative hypothesis, they are all equally correct and you can use any of them) Example Claim: Mean average salary for women 1 is lower than the mean average salary of men 2 H 0 : 1 2 H A : 1 2 (claim) By subtracting 2 from both sides we get. Remember saying group 1 is lower than group 2 is the same as saying the difference (group 1 – group 2) is negative. H 0 : 1 2 0 H A : 1 2 < 0 (claim) If the data is matched pair (husband and wife or same person measured twice) then you will sometime see 1 2 written as d H 0 : d 0 H A : d < 0 (claim) Example Claim: The percentage of women p1 is higher than the percentage of men p2 H 0 : p1 p2 H A : p1 p2 (claim) By subtracting p2 from both sides we get. Remember saying group 1 is lower than group 2 is the same as saying the difference (group 1 – group 2) is negative. H 0 : p1 p2 0 H A : p1 p2 0 (claim) What does this mean? H 0 : d 0 H A : d 0 (claim) Think: Think: So H 0 : 1 2 0 H A : 1 2 0 (claim) What does that mean? H 0 : 1 2 H A : 1 2 (claim) H 0 : d 0 H A : d 0 (claim) means that the two populations are the same or different. Assumptions 2 population mean average (Check these twice) Random At least 30 or bell shaped (normal) Matched Pair or Independent? Remember matched pair is a one-to-one pairing (not just something in common) 2 population proportion (percentage) (Check these twice) Random At least 10 success At least 10 failures Two groups should be independent Test Statistics 1 population test statistic sentence: the number of standard errors that the sample value is above or below the population value. 2 population test statistic sentence: the number of standard errors that the sample value from group 1 is above or below the sample value from group 2. Formula for two population test statistic (Z or T) sample value 1 sample value 2 standard error Example: group 1: women , group 2: men Comparing the percentage of women to the percentage of men. Test Statistic Z = +2.48 Sample percentage from group 1 (women) is 2.48 standard errors above the sample percentage from group 2 (men). Example: group 1: Valencia High School , group 2: Saugus High School Compare the mean average SAT scores Test Statistic T = -1.06 Sample mean average for group 1 (Valencia) was 1.06 standard errors below the sample mean average for group 2 (Saugus). StatCrunch Directions (Alternate null and alternative with “zero” Two Population proportion (percentage) Stat => Proportion-Stats => Two Sample => with data or with summary Two Population mean average (Independent groups) Stat => T-Stats => Two Sample => with data or with summary Two Population mean average (matched pair with raw data) Stat => T-Stats => Paired => columns? Two Population mean average (matched pair with summary data d , s d , n ) Stat => T-Stats => 1 sample =>with summary => put in mean, standard deviation, sample size Pool or Not to Pool? (That is the question) 1. Pooling in 2 population proportion problems (categorical data) P-pooled is combining the # of successes and the sample sizes of your two groups into one large sample. p ( x1 x2 ) (n1 n2 ) Note: You are allowed to pool the two sample percentages if the population percentages are equal. In confidence intervals we do not know if the populations are the same or not. So for 2 population proportion confidence intervals: Do not pool. In two population proportion hypothesis tests, it is OK to Pool, because you are assuming the population percentages are the same in null hypothesis. (Some programs ask if you want to pool for two population proportion, but StatCrunch does this automatically. It automatically pools for the 2 population proportion hypothesis test standard error and automatically does not pool for confidence interval standard error. You will see a slight difference in the standard error for hypothesis test verses confidence interval.) 2. Pooling the variances in 2 population mean average problems. (Quantitative data) You should not pool the sample variances unless you are sure the population variances are equal. Since we rarely know the population variances, do not pool the variances in StatCrunch. Act 11 #1 (Matched Pair with summary data) Group 1: After ACT scores Group 2: Before ACT scores H A : 1 2 (claim) H 0 : 1 2 Note: Alternate way of writing null and alternative H A : 1 2 0 (claim) H 0 : 1 2 0 H A : d 0 (claim) H 0 : d 0 Two Population mean average (matched pair with summary data d , s d , n ) Stat => T-Stats => 1 sample =>with summary => put in mean, standard deviation, sample size T test statistic = +2.9166 Sample mean of after scores were 2.92 standard errors above the sample mean of the before scores. After scores are significantly higher than before scores (class is effective) P-value = 0.0044 If Ho is true, then there is a 0.0044 probability of getting the sample data (sample difference) or more extreme by random chance. (unlikely to happen by random chance, Ho must be wrong.) P-value (0.0044) < sig level (0.05) Reject Ho Conclusion: There is significant sample evidence to support the claim that the ACT prep class is effective. (After > Before) Act 12/#2 Population 1: Marijuana Population 2: Non-marijuana H A : p1 p 2 (claim) H 0 : p1 p 2 Note: in StatCrunch null and alternative H A : p1 p2 0 (claim) H 0 : p1 p 2 0 Z test statistic = 6.85 Percentage of group 1 (marijuana users) was 6.85 standard errors above the percentage of group 2 (non-marijuana users) Percent of marijuana users that use other drugs is significantly greater. P-value = 0 (< 0.0001) If Ho is true, there was 0 probability of getting the sample data (sample difference) or more extreme by random chance. (Did not happen by random chance. Population 1 significantly different than population 2) Ho is wrong. Reject Ho There is significant sample evidence to support the claim the percent of marijuana users that use illegal drugs is higher than the percent of non-marijuana users that use illegal drugs. Math 140 Hypothesis Test Activity#10 Using Simulation to Understand Hypothesis Tests Difference of Means and Difference of Proportions Notes on Using Simulation to compare two groups The key to two population hypothesis testing is to see if one group’s sample data significantly disagrees with the other group’s sample data. This is difficult to do because of sampling variability. Random samples almost always give different values, so the sample values for both groups can be different yet not indicate that the populations are different. The key question in hypothesis testing is the following: Key Question: Could the sample data for both groups be different just because of sampling variability (random chance)? Or is the sample values so significantly different, that it causes us to think that the population values may be different. (i.e. The sample difference is not what we would expect by random chance!) How can we answer this key question we need to simulate a distribution based on the null hypothesis. This is often called a randomized simulation or a “randomization technique”. We need to simulate what a distribution should look like if the two groups are the same. Remember, if the groups are the same then their population mean difference would be zero. When simulating the difference between two groups we simulate taking samples from a population value of zero. 3 options If our real sample data (not simulated) shows a difference that is significantly greater than zero, that will give evidence that population 1 is greater than population 2. If our real sample data (not simulated) shows a difference that is significantly smaller than zero, that will give evidence that population 1 is less than population 2. If our real sample data (not simulated) shows a difference that is close to zero, then that may indicate that the populations are not significantly different. Simulation and P-value: The key is that we will need simulation to determine what is significant and what is not. Again, we will be looking at simulated P-values to determine this. The P-value is the chances of getting the sample difference if the two populations are really the same. The lower the P-value, the more evidence we will have that the two populations are different. The higher the P-value, the more likely that there is no significant difference between the groups. Directions: We will now look at the following problems and use StatKey at www.lock5stat.com. Look on the right side of the screen where it says “Randomization Hypothesis Tests” and click on the appropriate link (single mean, single proportion, difference of means, difference of proportions). We will be focusing on difference of means and difference of proportions in this activity. 1. Do men exercise more than women? Let’s test this claim. Go to the Lock website and click on StatKey and “Randomization test for a difference of means”. In the upper left corner change the box to “Male/Female Exercise Hours Per Week”. We took random data from 20 men and 30 women. Population 1 was men and population 2 was women, so the computer is subtracting in the order of men – women. a) What is the null and alternative hypothesis? Which is the claim? Is this a right tail, left tail, or two tailed test? b) Are the groups matched pairs or independent? c) Does this problem meet the assumptions to do the test? d) If gender does not matter and men and women work out the same, what would we expect the difference between the groups to be? e) The computer is making simulated samples from two populations that work out the same number of hours. Will all the simulations have a mean difference of zero? Why not? f) What was the sample mean difference in our real (original) sample data? Do you think this is significant? Discuss why it is difficult to judge significance without simulation, P-value or a Test Statistic. g) Simulate taking samples of men and women a few thousand times. We want to know if our original non-simulated difference of +3 hours was significant. Plug it in the bottom box. What percent of simulations had a difference of +3 or higher? What is the P-value? Why did we look at the values higher than +3 also and not just +3? h) Finish the test. (Use a 5% significance level) Do you reject the null hypothesis or fail to reject the null hypothesis? Write a conclusion that address the original claim (question). 2. Do women that don’t smoke have a greater chance of getting pregnant than those that smoke? Let’s test this claim. Go to the Lock website and click on StatKey and “Randomization test for a difference of proportions”. In the upper left corner change the box to “Get Pregnant (by Smoker status)”. We took a random sample of 135 smokers and found that 38 became pregnant. We also took a random sample of 543 non-smokers and found that 206 became pregnant. Population 1 was smokers and population 2 was non-smokers, so the computer is subtracting in the order of smokers – nonsmokers. a) What is the null and alternative hypothesis? Which is the claim? Is this a right tail, left tail, or two tailed test? b) Does this problem meet the assumptions to do the test? c) If smoking does not matter and the chances of getting pregnant is the same for smokers and non-smokers, what would we expect the difference between the groups to be? d) The computer is making simulated samples from two populations that have the same population percentage. Will all the simulations have a difference of zero? Why not? e) What was the sample percent difference in our real (original) sample data? Do you think this is significant? Discuss why it is difficult to judge significance without simulation, P-value or a Test Statistic. f) Simulate taking samples of smokers and non-smokers. We want to know if our original non-simulated percent difference of -0.098 was significant. Click on “left tail” and plug it in the bottom box. What percent of simulations had a difference of -0.098 or less? What is the estimated P-value? Why did we look at the values less than -0.098 also and not just -0.098? g) Finish the test. (Use a 5% significance level) Do you reject the null hypothesis or fail to reject the null hypothesis? Write a conclusion that address the original claim (question). Math 140 Hypothesis Test Activity#11 Hypothesis Tests for Comparing Two Population Means Directions: For each of the following problems: a. Write the null and alternative hypothesis. Is this a right tailed, left tailed or two tailed test? Is it two independent samples or matched pairs. b. Check whether the problem meets the assumptions necessary to perform the hypothesis test. List all the assumptions and how the problem meets the assumptions or does not meet the assumptions. c. Use Statcrunch and the information given in the problem to calculate the t test statistic. Write a sentence explaining the meaning of the test statistic. d. Use Statcrunch and the information given in the problem to calculate the P-value. Write a sentence explaining the meaning of the P-value. e. State whether you reject or fail to reject the null hypothesis and then write the conclusion for the hypothesis test. 1. The ACT exam is used by many colleges to test the readiness of high school students for college. Many high school students are now taking ACT prep classes. A local high school offers an ACT prep class, but wants to know if it really helps. Twenty students were randomly selected. They took the ACT exam before and after taking the ACT prep class. For each student the difference between the after and before scores were measured (d = after – before). The mean of the differences was 1.5 with a standard deviation of 2.3 . A histogram of the differences yielded a bell shaped distribution. Use a 5% significance level to test the claim that the prep class was effective in raising ACT scores. Make sure to give the null and alternative hypothesis, the test statistic, the p-value and a detailed conclusion. 2. Cotinine is an alkaloid found in tobacco and is used as a biomarker for exposure to cigarette smoke. It is especially useful in examining a person’s exposure to second hand smoke. A random sample of 32 non-smoking American adults was collected. These adults were not smokers and did not live with any smokers. The average cotinine level for this sample was 7.2 ng/mL with a standard deviation of 5.8 ng/mL. A second random sample of 35 non-smoking American adults was then collected. These adults did not smoke themselves, but did live with one or more smokers. The average cotinine level for this sample was 28.5 and had a standard deviation of 11.4 . Use a 1% significance level to test the claim that people that do not live with smokers have a lower cotinine level than those people that do live with smokers. Make sure to give the null and alternative hypothesis, the test statistic, p-value and a detailed conclusion. What does this tell us about the effects of second hand smoke? 3. Open the Female Health Data set. Copy and paste the systolic and diastolic blood pressure columns into Statcrunch. This is data from 40 randomly selected women throughout the U.S. We want to explore the relationship between a woman’s systolic blood pressure and her diastolic blood pressure. Use a 1% significance level to test the claim that systolic blood pressure is higher than diastolic blood pressure. Make sure to give the null and alternative hypothesis, the test statistic, the p-value and a detailed conclusion. 4. Now open the Male and Female Health Data set. Copy and paste the Male Cholesterol and Female Cholesterol levels into Statcrunch. (You may want to rename the columns to help distinguish between the two.) Use a 5% significance level to test the claim that the cholesterol levels of man and women are different. Make sure to give the null and alternative hypothesis, the test statistic, the p-value and a detailed conclusion. 5. Now open the Male and Female Health Data set. Copy and paste the Male Systolic Blood Pressure and Female Systolic Blood Pressure into Statcrunch. (You may want to rename the columns to help distinguish between the two.) Use a 5% significance level to test the claim that the average Systolic Blood pressures for men and women are the same. Make sure to give the null and alternative hypothesis, the test statistic, the p-value and a detailed conclusion. Math 140 Hypothesis Test Activity#12 Hypothesis Testing for Two Proportions Directions: For each of the following problems: a. Write the null and alternative hypothesis. Is this a right tailed, left tailed or two tailed test? b. Check whether the problem meets the assumptions necessary to perform the hypothesis test. List all the assumptions and how the problem meets the assumptions or does not meet the assumptions. c. Use Statcrunch and the information given in the problem to calculate the z test statistic. Write a sentence explaining the meaning of the test statistic. d. Use Statcrunch and the information given in the problem to calculate the P-value. Write a sentence explaining the meaning of the P-value. e. State whether you reject or fail to reject the null hypothesis and then write the conclusion for the hypothesis test. 1. The United States has the highest teen pregnancy rate in the industrialized world. In 2008 a random sample of 1014 teenage girls found that 326 of them were pregnant before the age of 20. Has the proportion of teenage pregnancy increased as of 2012? In 2012, a random sample of 1025 teenage girls was taken and 334 were found to be pregnant before the age of 20. (Use a 10% significance level.) 2. While many Americans favor the legalization of marijuana, opponents of legalization argue that marijuana may be a gateway drug. They believe that if a person uses marijuana, then they are more likely to use other more dangerous illegal drugs. Use the table of random data given below to test the claim that marijuana users have a higher percentage of other drug use than non-marijuana users. (Hint: Use a 5% significance level.) Uses Other Drugs Total Uses Marijuana 87 213 Does not use Marijuana 26 219 3. An article recently suggested that the percent of women worldwide that abstain from drinking alcohol is significantly higher than the percent of men that abstain from drinking. Use the following sample data to test this claim. We took a random sample of 250 women and found that that 137 of them never drink. We took a random sample of 190 men and found that 66 of them never drink. (Use a 5% significance level.) 4. A health magazine claims that marriage status is one of the most telling factors for a person’s happiness. Use the table below to test the claim that the percent of married people that are unhappy is lower than the percent of single or divorced people that are unhappy. The data was collected randomly. (Use a 10% significance level.) Unhappy Total Married 74 200 Single or Divorced 97 200 5. A tattoo magazine claimed that the percent of men that have at least one tattoo is greater than the percent of women with at least one tattoo. Test this claim with the following sample data. A random sample of 794 women found that 137 of them had at least one tattoo. A random sample of 857 men found that 146 of them had at least one tattoo. (Use a 5% significance level.) 6. A body mass index of 20-25 indicates that a person is of normal weight. A random sample of 745 men and 760 women found that 198 of the women and 273 of the men had a normal BMI score. A fitness magazine claims that the percent of women with a normal BMI is lower than the percent of men with a normal BMI. (Use a 10% significance level.) 7. A new medicine has been developed that treats high cholesterol. An experiment was conducted and adults were randomly selected into two groups. The groups had similar gender, ages, exercise patterns and diet. Of the 420 adults in the placebo group, 38 of them showed a decrease in cholesterol. Of the 410 adults in the treatment group, 49 of them showed a decrease in cholesterol. The drug company claims the medicine is effective, that is they claim that the percent of adults that have lower cholesterol is greater in the treatment group than in the placebo group. Use the sample data to test this claim. (Use a 1% significance level.) Use the Math 140 Survey Data. Remember this was a census of non-PAL Math 140 students from this semester. But what does this data allow us to say about the entire population of 140 students. 8. Test the claim that the percent of female 140 students is greater than the percent of male 140 students. 9. Test the claim that the percent of 140 students that are Republican is different than the percent of 140 students that are Democrat. 10. Test the claim that the percent of 140 students that use Instagram is less than the percent of Math 140 students that use Facebook. Hypothesis Test Notes Chi-Squared Test Statistic & Goodness of Fit Test Remember when comparing a sample percentage to a claimed population percentage we use a 1 proportion hypothesis test and a Z-test statistic. When comparing a sample percentage for 1 group to a sample percentage from a second group we use a 2 proportion hypothesis test and a Z-test statistic. In both cases the Z-test statistic counts the number of standard errors one thing is from another. But what if we have more than 2 groups we are comparing? Or what if we are comparing multiple variables in multiple groups (two way table)? The answer to both of these is the “Chi-Squared Test Statistic”. 2 The basic idea of any test statistic is to compare the sample data to the null hypothesis. In ChiSquared, we will calculate the “Expected Values” if the null hypothesis is true. Example 1 Let’s suppose that the percentage of high school students that graduate are the same five different high schools. This multiple P hypothesis test is often called a “Goodness of Fit Test”. H 0 : p1 p2 p3 p4 p5 H A : at least one is Suppose we have a total of 105 students that graduate. How many would we expect from each school? This are the “Expected Values” Notice if Ho is true then, all the schools would have the same number of graduates from the 105 total. In other words the expected values should all be 21 (105 divided by 5). Now we need to compare what really happened to those expected values. Here is the observed sample data. (Observed Values) O1 17 O2 24 O3 13 O4 25 O5 26 So when doing Chi-Squared Hypothesis Tests, think “Expected” means Ho, but “Observed” means sample data. The formula for the chi-squared test statistic is pretty formidable. Remember, the computer will be doing the heavy lifting. We need to understand the formula and be able can explain it. 2 O E 2 E Notice we are finding the difference between the observed values (sample data) and expected values (null hypothesis). Since we will sometimes get negative numbers we are squaring the differences. This is why the test statistic is called “Chi-Squared”. We are dividing by the expected value so we are looking at an average of the squares. Then adding all of these together gives us the total Chi-Squared. Remember this is a way to compare complex categorical data to the null hypothesis. Chi-Squared Sentence: The sum of the averages of the squares of the difference between the observed sample data and the expected values if the null hypothesis it true. Let’s calculate the chi-squared test statistic for example 1. Remember all of the expected values are 21 and the observed values are given below. O1 17 O2 24 O3 13 O4 25 O5 26 2 4 21 2 O E E 3 21 2 17 21 21 2 8 21 2 2 24 21 4 21 21 2 5 21 2 13 21 21 2 = 2 25 21 21 2 26 21 2 21 16 9 64 16 25 130 6.19 21 21 21 21 21 21 Note: While 6.19 is a lot for a Z-score or T-score, 6.19 may not be significant for a Chi-Squared. Remember Chi-Squared comes from adding up squared numbers and can be rather large. We would need to see a simulation or a P-value to see if 6.19 is significant or not. Let’s simulate what chi-squared test statistics we would expect if the null hypothesis was true. Here is a simulation created with StatKey. First of all, what is the shape of the chi-squared distribution? Notice the Chi-Squared distribution is not bell shaped (normal). It is always Skewed Right. Chi-Squared hypothesis tests are always right tail. Remember squared numbers are always positive and adding up squared numbers gives you a positive sum. So it is impossible for ChiSquared to be negative. Chi-Squared hypothesis tests are never left tailed or two tailed. Chi-Squared takes complicated categorical data and condenses it into 1 right tail test. Now what about the Chi-Squared test statistic of 6.19 that we computed? Is it significant? (In the tail) Could it happen by random chance? What is the estimated P-value? Remember, like all hypothesis tests, there are two reasons for the sample data being different than the null hypothesis. Either the null may be true and the sample data is different because all samples are different (random chance), or, the null hypothesis is wrong. Which is it in this case? Notice the data is not significant (in the tail) and could have happened by random chance (18.9%). So there is not a significant difference between the observed sample data and the expected values from the null hypothesis. Since we have not ruled out random chance, we cannot be sure if the null hypothesis is indeed wrong. So we would fail to reject the null hypothesis. Example 2 Let us suppose that someone had a different claim they wanted to test with the school graduation data. They claim that 15% of the graduates come from school 2, 15% from school 4, 15% from school 5, 25% from school 1, and 30% from school 3. This is also a multiple P test (Goodness of Fit Test) though the null hypothesis looks a little different. Notice the groups are checking the percentage for the same success variable (graduating). We are only checking one percentage in each group. This is the trademark of a Goodness of Fit test. H 0 : p1 25%, p2 15%, p3 30%, p4 15%, p5 15% H A : at least one is Let’s calculate the Chi-Squared test statistic again. Let’s start by calculating the expected values from the null hypothesis. This null hypothesis suggests that each group has a different percentage and therefore a different expected value. Remember our total number of graduates was 105. The null hypothesis suggest that 25% of those will come from school 1, 15% of them will come from school 2, school 4 and school 5, 30% will come from school 3. Remember to calculate a percentage of a total simply convert the percentage into a decimal and multiply by the total. Here are our expected values. E1 0.25 105 26.25 E2 0.15 105 15.75 E3 0.30 105 31.5 E4 0.15 105 15.75 E5 0.15 105 15.75 Remember these are what we expect to get if the null hypothesis is true. We can compare these with the Observed sample data values. O1 17 O2 24 O3 13 O4 25 O5 26 Now let’s calculate the Chi-Squared test statistic. 2 O E 9.25 26.25 E 2 2 17 26.25 2 26.25 8.25 15.75 2 18.5 31.5 24 15.75 15.75 2 9.25 2 15.75 2 13 31.5 31.5 10.25 15.75 2 25 15.75 15.75 2 26 15.75 2 15.75 2 = 85.5625 68.0625 342.25 85.5625 105.0625 26.25 15.75 31.5 15.75 15.75 3.2595 4.3214 10.8651 5.4325 6.6706 30.55 Is a Chi-Squared Test Statistic of 30.55 significant? Remember Chi-Squared test statistic are squared numbers added up, so it can be very large. Let’s calculate a P-value with StatCrunch this time to determine if it is significant. To calculate a P-value for a Goodness of Fit test we will need to do the following. First type in the observed sample values in a column of StatCrunch. If the null hypothesis has specific percentages instead of equal, then type these percentages (written as decimals) in another column. Remember the percentage has to coincide with the observed value from the same variable. Stat Goodness of Fit Chi-Squared Test Tell StatCrunch what column your observed sample data is in. If the null hypothesis is all groups equal then click the button that says “all cells in equal proportion” under the “Expected” menu. In this case each school had a different percentage in the null hypothesis. So under the “Expected” menu, click the column where the percentages are. Now click compute. Notice the P-value says “<0.0001”. This is what StatCrunch writes when the P-value is very close to zero. P value 0 Remember, like all hypothesis tests, there are two reasons for the sample data being different than the null hypothesis. Either the null may be true and the sample data is different because all samples are different (random chance), or, the null hypothesis is wrong. Which is it in this case? A P-value of 0 is very significant and since P-value is the probability of the sample data happening by random chance, this data was very unlikely to happen by random chance. So there is a significant difference between the observed sample data and the expected values from the null hypothesis. We have ruled out random chance and can Reject the null hypothesis. Key Points about the Goodness of Fit Test A Goodness of Fit test checks the same success variable in multiple groups. The sample data will be a single row or column of observed values. (Not a two-way table). Sample Null and Alternative Hypothesis (two types) H 0 : p1 p2 p3 p4 p5 H A : at least one is H 0 : p1 25%, p2 15%, p3 30%, p4 15%, p5 15% H A : at least one is Chi-Squared Test Statistic and P-value can be calculated with simulation (StatKey) or with StatCrunch. (Do not calculate this by hand.) The Chi-Squared distribution is always skewed right. Any hypothesis test using the ChiSquared test statistic will always be a right tailed test. Chi-Squared Sentence: The sum of the averages of the squares of the difference between the observed sample data and the expected values if the null hypothesis it true. What are the assumptions? All Chi-Squared hypothesis tests have the same assumptions: 1. Random 2. All expected values must be at least 5 (observed sample data is large enough) Large Chi-Squared Test Statistic (in the tail of the simulation) and Small P-value both tell us that the data probably did not happen by random chance and is significant. The observed sample data significantly disagrees with the expected values from the null hypothesis. We can therefore Reject the null hypothesis. Small Chi-Squared Test Statistic (Not in the tail of the simulation) and Large P-value both tell us that the data could have happen by random chance and is not significant. The observed sample data does not significantly disagree with the expected values from the null hypothesis. Since we cannot rule out random chance, we don’t know if the null is right or wrong, so we Fail to Reject the null hypothesis. Conclusions may be written in the same way as all hypothesis tests. If the claim is the null hypothesis, then you will either have evidence to reject the claim (small P-value) or not have evidence to reject the claim (Large P-value). If the claim is the alternative hypothesis, then you will either have evidence to support the claim (small P-value) or not have evidence to support the claim (Large P-value). Degrees of Freedom = K – 1 (K is # of groups) Expected Values (Automatically calculated with StatCrunch) n E (n = sample size total, K = # of groups, use this for case when all groups are k assumed to be equal in null hypothesis) E n p (n = sample size total , p = percentage from each group, use this for case when each group has different percentage in the null hypothesis) Math 140 Hypothesis Tests Activity #13 Goodness of Fit Tests Directions: For numbers 1-3, use statkey at www.lock5stat.com to simulate the following chi-squared goodness of fit tests. Go to “more advanced randomization tests” at the bottom of the statkey page. Click on the button that says “chi-squared goodness of fit”. 1. It is a big job to write and grade the AP-statistics exam for high school students each year. It is a difficult multiple choice exam. All questions have five possible answers A-E. Test the claim that percent of A answers is the same as the percent of B answers which is the same as C,D and E. The data has already been entered in StatKey. How many categories are we checking? What is the degrees of freedom? If all the categories are equal, what percent would we expect each of them to be in the null hypothesis? Give the null and alternative hypothesis. Simulate the null hypothesis. Notice statkey simulates chi-squared values from a population that is equal. Use the right tailed function to check if the real original sample data is significant. What is test statistic? What is the P-value? In the original sample data, were the observed sample values significantly different than the expected values in the null hypothesis? Could the original sample data have happened by random chance (sampling variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for the test. (Use a 5% significance level.) (You can assume that the data meets the assumptions for inference.) 2. Alameda county needs to verify that it’s juries in court of law are representative of the population. On the top left of Statkey, change the data to “Alameda County Juries”. Alameda county is supposedly made up of 54% Caucasion, 18% African American, 12% Hispanic American, 15% Asian American and 1% other. Test the claim that Alameda’s court juries do not represent these theoretical percentages of race. How many categories are we checking? What is the degrees of freedom? Give the null and alternative hypothesis. Simulate the null hypothesis. Notice statkey simulates chi-squared values from a population that is the same as the county’s theoretical percentages. Use the right tailed function to check if the real original sample data is significant. What is test statistic? What is the P-value? In the original sample data, were the observed sample values significantly different than the expected values in the null hypothesis? Could the original sample data have happened by random chance (sampling variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for the test. (Use a 5% significance level.) (You can assume that the data meets the assumptions for inference.) 3. Open the math 140 survey data. Copy and paste the column describing the type of transportation data into statkey. You will need to push the “edit data” button. A person that works at COC claimed that 80% of COC students drive alone, 10% carpool, 5% are dropped off by someone, 2% walk, 1% bike, and 2% use public transportation. Let us check if the math 140 stat students are different than these claimed percentages. How many categories are we checking? What is the degrees of freedom? Give the null and alternative hypothesis. Simulate the null hypothesis. Notice statkey simulates chi-squared values from a population that is the same as the theoretical percentages. Use the right tailed function to check if the real original sample data is significant. What is test statistic? What is the P-value? In the original sample data, were the observed sample values significantly different than the expected values in the null hypothesis? Could the original sample data have happened by random chance (sampling variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for the test. (Use a 5% significance level.) (You can assume that the data meets the assumptions for inference.) Directions: For numbers 4-7, use statcrunch to find the chi-squared test statistic and P-value and complete the hypothesis test. You will need to go to the “stat” menu, then “goodness of fit”, then “chi-squared test”. 4. In a random sample of 60 COC students, 29 were liberal, 23 were conservative and 8 were moderate. Test the claim that the percent of people in each political party are equal at COC. Find the expected values for each category. Does this data meet the assumptions necessary to perform a Goodness of Fit Test? If so, test the claim that the probability of being in each party is the same. Make sure to give the null and alternative hypothesis, the Chi Squared test statistic, the p-value, whether we reject the null hypothesis or fail, and a conclusion. As a follow up, answer the following two questions: Are the observed values significantly different than the expected values from the null hypothesis? Could the sample data have happened by random chance (sampling variability)? 5. An online sports magazine wrote an article about the favorite sports in America. It said that 43% of Americans prefer Football, 23% of Americans prefer Baseball, 20% of Americans prefer Basketball, 8% of Americans prefer Hockey, and 6% of Americans prefer Soccer. When 130 randomly selected COC students were asked their favorite sport we found the following: 44 said Football, 26 said Baseball, 29 said Basketball, 13 said Hockey, 18 said Soccer. Test the claim that COC students do not match the distribution claimed in the magazine article. Find the expected values for each category. Does this data meet the assumptions necessary to perform a Goodness of Fit Test? If so, test the claim that the probability of liking each sport is different than what the magazine suggested. Make sure to give the null and alternative hypothesis, the Chi Squared test statistic, the p-value, whether we reject the null hypothesis or fail, and a conclusion. As a follow up, answer the following two questions: Are the observed values significantly different than the expected values from the null hypothesis? Could the sample data have happened by random chance (sampling variability)? 6. Thousands of people die from car accidents across the U.S. every year, but are the probabilities of dying in a car accident the same for every day of the week? The following data summary gives the observed number of the number of deaths from car accidents in the U.S. for each day of a randomly selected week. The total number of deaths for the week was 805. Find the expected values for each category. Does this data meet the assumptions necessary to perform a Goodness of Fit Test? If so, test the claim that the probability of dying in a car accident is the same for each day. Make sure to give the null and alternative hypothesis, the Chi Squared test statistic, the p-value, whether we reject the null hypothesis or fail, and a conclusion. Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday Number of Deaths 106 104 103 113 130 132 117 As a follow up, answer the following two questions: Are the observed values significantly different than the expected values from the null hypothesis? Could the sample data have happened by random chance (sampling variability)? 7. The National Highway Traffic Safety Administration (NHTSA) publishes reports about motorcycle fatalities and helmet use. The following distribution shows the proportion of fatalities by location of injury for motorcycle accidents. Location of Injury Multiple Locations Head Neck Thorax Abdomen/Spine Proportion 0.57 0.31 0.03 0.06 0.03 The sample data below shows the distribution of 2068 randomly selected fatalities from riders that were not wearing a helmet. Use a 0.05 significance level to test the claim that the distribution for the sample does not match the proportions given by the NHTSA. Make sure to find the expected values and verify the assumptions for a Goodness of Fit test. Make sure to give the null and alternative hypothesis, the Chi Squared test statistic, the p-value, whether we reject the null hypothesis or fail, and a conclusion. Where is the largest discrepancy between the observed and expected value? What does this tell us about the importance of wearing helmets? Location of Injury Number of Deaths Multiple Locations Head Neck Thorax Abdomen/Spine 1036 864 38 83 47 As a follow up, answer the following two questions: Are the observed values significantly different than the expected values from the null hypothesis? Could the sample data have happened by random chance (sampling variability)? Hypothesis Test Notes Chi-Squared Independence & Homogeneity Test The Chi-Squared test statistic is very versatile and can be used for many different categorical hypothesis tests. We often want to study relationships between categorical variables. Making a two-way table and studying conditional probabilities help us to understand the categorical data. Example 1 A sample of 75 college students was taken to determine if listening to music helps or hurts a person’s ability to retain information. The students were randomly put into three groups. One group got to listen to their favorite music, one group had to listen to music they hated, and one group had no music. The students were assigned to memorize the same information and were given a test to determine how much of the information they remembered. Liked Music Disliked Music No Music Total High Retention 10 11 18 39 Low Retention 14 15 7 36 Total 24 26 25 Grand Total = 75 We learned in our unit on probability that conditional probabilities are important to explore when analyzing relationships between categorical variables. We also learned two very important principles. 1. When conditional probabilities were close or equal it indicated that the variables were not related to each other. (Independent) 2. When conditional probabilities were significantly different it indicated that the variables were related to each other. (Dependent) This is a very important principle that is the guiding idea behind the hypothesis tests for Independence and Homogeneity. Let’s look at some conditional probabilities in the music problem. P( high retention | liked music ) = 10/24 = 0.417 or 41.7% P( high retention | disliked music ) = 11/26 = 0.423 or 42.3% If we just looked at these two conditional probabilities, we might think that music and retention are not related. (Independent) Here lies the fundamental problem. We are not really taking the entire two-way table and all of the conditional probabilities into account. If we look at another conditional probability, we may come to a different conclusion. Look at these two. P( high retention | liked music ) = 10/24 = 0.417 or 41.7% P( high retention | no music ) = 18/25 = 72% These two probabilities are significantly different and would lead us to the conclusion that music and retention are related (dependent). So it is difficult to determine if categorical variables are related or not by just looking at two conditional probabilities. We need a better way to do this. Chi-Squared Hypothesis Test for Independence and Homogeneity The Chi-Squared test statistics can do a much better job. Not only will it take into account every conditional probability, but it will also see if the data is significant enough to apply to the population. Null and Alternative Hypotheses Remember the guiding principle. If the distribution of conditional probabilities is the same, this indicates independence (not related). If the distribution of conditional probabilities is different, this indicates dependence (related). There are two hypothesis tests that are equivalent statements and can be tested with the same data, same test statistic and same P-value. Homogeneity tests that the distribution of conditional probabilities are the same verses different. Independence tests that the variables are independent or dependent. The key is that these are equivalent statements. Homogeneity Test H 0 : The distribution for retaining information is the same for the various music options H A : The distribution for retaining information is different for the various music options Independence Test H 0 : Music and Retaining Info are Independent (not related) H A : Music and Retaining Info are Dependent (related) Notes about the null and alternative hypothesis: The null hypothesis is that the various conditional probabilities for each variable are the same. This implies that the condition does not matter and the variables are independent (not related). The alternative hypothesis is that the various conditional probabilities for each variable are different. This implies that the condition does matter and the variables are dependent (related). When describing lots of probabilities for lots of variables, we often use the word “distribution” and say “the distribution is the same” or “the distribution is different”. It does not mean all the probabilities are the same or close, but only those probabilities from the same row or column are the same or close. Though the null and alternative hypotheses for the homogeneity test and independence test are equivalent, that does not mean that you should not put the correct null and alternative hypothesis. If the claim is that the variables are related, not related, dependent or independent, you should give the null and alternative for an Independence Test. H 0 : Categorical variables are Independent (not related) H A : Categorical variables are Dependent (related) If the claim is that the variables have the same or different conditional probabilities, of if the claim is that the distribution is the same or different, you should give the null and alternative for a Homogeneity Test. H 0 : The distribution is the same H A : The distribution is different Expected Values Remember the Chi-Squared test statistic compares observed sample data to the expected values. These “expected values” are what we expect to happen if the null hypothesis is true. For the Independence Test, the null is that they are independent, but remember that is equivalent to the null for the Homogeneity Test that the distribution of conditional probabilities is the same. So if the null is true we should expect the conditional probabilities for each variable to be the same. Let’s work this out for the music and retention problem. Liked Music Disliked Music No Music Total High Retention 10 11 18 39 Low Retention 14 15 7 36 Total 24 26 25 Grand Total = 75 P (high retention) = 39/75 = 0.52 If the null hypothesis is true, we expect this probability to be the same regardless of the music choice. Remember the expected values are found by a percent of a total (multiplying the total in the column or row by the probability). E n p Only the n is not the grand total, it is the total for each column (music choice) So if the null is true we expect the p for high retention to always be 0.52 and the expected values will be 0.52 x total students for each music choice. Elike / high n p 24 0.52 12.48 Edislike / high n p 26 0.52 13.52 Enone / high n p 25 0.52 13.0 P (low retention) = 36/75 = 0.48 If the null hypothesis is true, we expect this probability to be the same regardless of the music choice. Remember the expected values are found by multiplying the amount by the probability. E n p Only the n is not the grand total is the total for each column (music choice) So if the null is true we expect the p for low retention to always be 0.48 and the expected values will be 0.48 x total students for each music choice. Elike / low n p 24 0.48 11.52 Edislike / low n p 26 0.48 12.48 Enone / low n p 25 0.48 12.0 Calculate the Chi-Squared Test Statistic We learned that the Chi-Squared test statistic is a comparison of the observed sample values and the expected values from the null hypothesis. Here is the formula again. 2 O E 2 E So Chi-Squared subtracts the observed and expected values to find the difference. Since some differences are negative, it squares the differences. It also divides by E to make it a kind of average of squares and finally it adds up these values for every variable. Here is the sentence to explain Chi-Squared again: “The sum of the averages of the squares of the differences between the observed sample data and the expected values if the null hypothesis were true.” Liked Music Disliked Music No Music Total High Retention 10 11 18 39 Low Retention 14 15 7 36 Total 24 26 25 Grand Total = 75 In this example the numbers in the two-way table are the observed values. Note: The observed values do not include the totals! This two-way table has 2 rows and 3 columns (not counting totals). So we have a total of 6 observed values and 6 expected values. Liked Music Disliked Music No Music High Retention 10 11 18 Low Retention 14 15 7 Let’s calculate the Chi-Squared test statistic for this problem. Here are the expected values to compare to. It is good to label so that you subtract the correct observed value with the correct expected value. Elike / high n p 24 0.52 12.48 Edislike / high n p 26 0.52 13.52 Enone / high n p 25 0.52 13.0 Elike / low n p 24 0.48 11.52 Edislike / low n p 26 0.48 12.48 Enone / low n p 25 0.48 12.0 2 O E 2.48 E 2 12.48 2 10 12.48 12.48 2.52 13.52 2 5 13 2 2 11 13.52 2 13.52 2.48 11.52 2 18 13 2.52 12.48 2 13 2 5 14 11.52 11.52 2 15 12.48 12.48 2 7 12 2 12 6.1504 6.3504 25 6.1504 6.3504 25 12.48 13.52 13 11.52 12.48 12 0.49282 0.46970 1.92308 0.53388 0.50885 2.08333 6.01167 Notice again the numbers that were added to get the Chi-Squared test statistic. These are called the contributions to Chi-Squared. Which cells had the greatest contribution to Chi-Squared? These are the ones where the observed values disagreed with the null hypothesis the most. 12 2 Assumptions The assumptions for any Chi-Squared hypothesis test are as follows: 1. Random 2. All Expected Values at least 5 E 5 Did this problem meet the assumptions? (notice the data was random and all of the expected values were at least 5) Is it significant? We have had a little experience with Chi-Squared. We know they are always skewed right and right tailed. It can be pretty large, so you may not be sure if a 2 6.01167 is significant. There are two ways to handle this question. P-value or Simulation. Let’s start with simulating Chi-Squared test statistics. A simulation of this data with StatKey gave the following. Notice StatKey calls the two-way table hypothesis tests a “Chi-Square Test for Association”. Association means dependent or related. Was our Chi-Squared test statistic of 6.012 significant (in the tail)? What was the estimated P-value? Was it likely or unlikely that this data happened by random chance? If we were using a 5% significance level, would we reject the null hypothesis or fail to reject? The test statistic was in the tail. The estimated P-value was 0.048 or (4.8%). It was unlikely to happen by random chance (4.8%). This is a borderline case. Significance will depend on the significance level. If the P-value is less than the significance level it is significant. At a 5% significance level, we would reject the null hypothesis. (P-value < sig level) Writing a conclusion Remember we said that there were two different hypothesis tests that we could do with this data and Chi-Squared test statistic, Independence test or Homogeneity test. Here are the null and alternative hypotheses again. Remember a conclusion must address the claim. Homogeneity Test H 0 : The distribution for retaining information is the same for the various music options H A : The distribution for retaining information is different for the various music options Suppose the claim was that distribution is different for various music options. (Alternative hypothesis for homogeneity test) Since we rejected the null hypothesis, our conclusion would be: “We have significant sample evidence to support the claim that the distribution for retaining information is different.” (Notice this implies that we also have evidence to support dependence.) Independence Test H 0 : Music and Retaining Info are Independent (not related) H A : Music and Retaining Info are Dependent (related) Suppose the claim was that music and retaining information are not related (independent). (Null hypothesis for Independence test). Since we rejected the null hypothesis, the conclusion would be: “We have significant sample evidence to reject the claim that music and retaining information are independent.” (Notice this also implies something about homogeneity. The conditional probabilities must also be significantly different.) Calculating the Chi-Squared test statist and P-value with StatCrunch Remember, do not calculate Chi-Squared by hand with a calculator. That is the job of statistics programs like StatCrunch. StatCrunch Directions To calculate the Chi-Squared test statistic and P-value with StatCrunch, start by typing or pasting in the two-way table exactly as you see it in the problem. Remember do not type in the totals. Then go to the “Stat” menu, click on “Tables” then on “Contingency”. Then click on “with summary”. Hold the control key and highlight all of the columns with numbers. Under “Row Labels” highlight the column where your row labels are. Click on “Expected Count” and “Contribution to Chi-Squared”. Then push compute. Let’s try it. Put in the two-way table from the last example into StatCrunch and calculate the Expected values, contribution to Chi-Squared, Chi-Squared test statistic and P-value. You should see the following: Contingency table results: Rows: var1 Columns: None Cell format Count (Expected count) (Contributions to Chi-Square) Liked Music Disliked Music No Music Total High Retention 10 (12.48) (0.49) 11 (13.52) (0.47) 18 (13) (1.92) 39 Low Retention 14 (11.52) (0.53) 15 (12.48) (0.51) 7 (12) (2.08) 36 24 26 25 75 Total Chi-Square test: Statistic DF Value P-value Chi-square 2 6.01167 0.0495 Notice each cell now gives 3 numbers. The first is the observed sample data. The second is the expected counts (expected values). These are the ones you need to write down and determine if they are at least 5. The last is the contribution to Chi-Squared. Notice the larges contribution to Chi-Squared came from the “no music” group. We had a lot more “high retention” students than we expected and a lot less “low retention” than we expected. This gives some evidence that when trying to retain information, “no music” seems to be best. Notes about the Chi-Squared Independence Test and Homogeneity Test Though the Independence and Homogeneity tests use the same test statistic, you should write the correct null and alternative hypothesis. If the claim is that the variables are related, not related, dependent, independent, associated, or not associated, you are doing an Independence test. If the claim is that the distribution of conditional probabilities is the same or different, then you are doing a Homogeneity test. The assumptions for any Chi-Squared hypothesis test are as follows: 1. Random 2. All Expected Values at least 5 E 5 The Degrees of Freedom for a two-way table are as follows: Degrees of Freedom (df) (r 1)(c 1) Where “r” is the number of rows (not counting totals) and “c” is the number of columns (not counting totals). In the example, the degrees of freedom would be: Degrees of Freedom (df) (r 1)(c 1) (2 1)(3 1) (1)(2) 2 Remember Chi-Squared distribution is always skewed right and can be very large. The hypothesis tests that use Chi-Squared are right tailed tests. Do not calculate Chi-Squared test statistic by hand with a calculator. Use a statistics program like StatKey or StatCrunch. Remember Chi-Squared test statistics can be very large. Refer to either the simulation (in the tail) or the P-value (close to zero) to determine significance. To simulate a Chi-Squared test statistic with StatKey, we go to “Test for Association” under the “advanced randomization tests” menu. To calculate the Chi-Squared test statistic and P-value with StatCrunch, start by typing or pasting in the two-way table exactly as you see it in the problem. Remember do not type in the totals. Then go to the “Stat” menu, click on “Tables” then on “Contingency”. Then click on “with summary”. Hold the control key and highlight all of the columns with numbers. Under “Row Labels” highlight the column where your row labels are. Click on “Expected Count” and “Contribution to Chi-Squared”. Then push compute. Math 140 Hypothesis Test Activity #14 Conditional Probabilities, Simulation and Independence Tests Directions: For numbers 1 and 2, use statkey at www.lock5stat.com to simulate the following chisquared goodness of fit tests. Go to “more advanced randomization tests” at the bottom of the statkey page. Click on the button that says “chi-squared test for association”. Remember a test for association has the same test statistic as a homogeneity test. 1. We want to know if the state a home is built in is related to whether or not the home is large. A random sample of homes in the U.S was taken. They determined the state the home was built in and whether or not the home was large. In statkey, you will find this data already entered. If not, look at the top left button. It should say “homes for sale(size by state)”. Test the claim that where a home is built is related to its size. What is the degrees of freedom? Give the null and alternative hypothesis. Simulate the null hypothesis. Notice statkey simulates chi-squared values from a population that has the same distribution (i.e. as if the companies were the same). Use the right tailed function to check if the real original sample data is significant. What is test statistic? What is the P-value? In the original sample data, were the observed sample values significantly different than the expected values in the null hypothesis? Could the original sample data have happened by random chance (sampling variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for the test. (Use a 5% significance level.) (You can assume that the data meets the assumptions for inference.) 2. Let’s look at another data set in Statkey. We want to know if gender is independent to getting an award. A random sample of people that won famous awards in the Olympic, Academia, and Nobel was taken. They determined the gender of each of the people that won the award. In statkey, you will find this data already entered. If not, look at the top left button. It should say “student survey (award by gender)”. Test the claim that gender is totally independent of awards. What is the degrees of freedom? Give the null and alternative hypothesis. Simulate the null hypothesis. Notice statkey simulates chisquared values from a population that has the same distribution (i.e. as if the companies were the same). Use the right tailed function to check if the real original sample data is significant. What is test statistic? What is the P-value? In the original sample data, were the observed sample values significantly different than the expected values in the null hypothesis? Could the original sample data have happened by random chance (sampling variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for the test. (Use a 5% significance level.) (You can assume that the data meets the assumptions for inference.) Directions for #3-6: The following table describes the gender and majors of 692 randomly selected students at a local college. Recall that two variables are considered independent if one event occurring does not significantly change the probability of the other event occurring. For example, in chapter 5 we said that two events are independent if P A | B P A | C . In other words, changing the condition from B to C did not matter. The probability stayed the same. But what if P A | B is not equal to P A | C , but it is close? What would we consider a significant difference in order to say for sure that categories are independent? That is question behind the Chi Squared independence test. For numbers 3c, 4c, 5c and 6c, use statcrunch to find the chi-squared test statistic and P-value and complete the hypothesis test. First type the two way table as you see it into statcrunch. You will need the row and column labels also. Now go to the “stat” menu, then “tables”, then “contingency”. Tell statcrunch the columns your counts are in and under “row labels” the column your row labels are in. Be sure to click on expected counts and contribution to chi-squared also. 3. Female Male Business English History Music Biology Math 89 71 62 48 56 9 112 58 59 53 62 13 a) Find all the row and column totals for the table. b) Do you think that gender and major are independent (not related) or dependent (related)? Find a couple conditional probabilities to back up your answer. Hint: Use the probability formula: P A | B P A | C . c) Now use a Chi Squared Independence Test on Statcrunch to test the claim that gender and major are dependent (related). Write down all the expected values. Does the problem meet all the assumptions necessary to perform the test? Include the null and alternative hypothesis, the Chi Squared test statistic, p-value, whether or not you reject the null hypothesis and a conclusion. d) Now go back and analyze your answer to #1b. Would you change your mind? 4. Type A Type B Type AB Type O Male 40 9 5 60 Female 30 8 7 50 a) Find all the row and column totals for the table. b) Do you think that gender and blood type are independent (not related) or dependent (related)? Find a couple conditional probabilities to back up your answer. Hint: Use the probability formula: P A | B P A | C . c) Now use a Chi Squared Independence Test on Statcrunch to test the claim that gender and blood type are independent (not related). Write down all the expected values. Does the problem meet all the assumptions necessary to perform the test? Include the null and alternative hypothesis, the Chi Squared test statistic, p-value, whether or not you reject the null hypothesis and a conclusion. d) Now go back and analyze your answer to #2b. Would you change your mind? 5. Med/Surg ICU SDS ER 18-35 19 4 25 16 36-49 27 7 22 9 50-64 17 13 15 17 65+ 12 21 8 19 a) Find all the row and column totals for the table. b) Do you think that age and what part of the hospital the person went to are independent (not related) or dependent (related)? Find a couple conditional probabilities to back up your answer. Hint: Use the probability formula: P A | B P A | C . c) Now use a Chi Squared Independence Test on Statcrunch to test the claim that age and part of the hospital are independent (not related). Write down all the expected values. Does the problem meet all the assumptions necessary to perform the test? Include the null and alternative hypothesis, the Chi Squared test statistic, p-value, whether or not you reject the null hypothesis and a conclusion. d) Now go back and analyze your answer to #3b. Would you change your mind? 6. Type A Type B Type AB Type O Rh+ 35 24 11 91 Rh - 12 6 10 21 a) Find all the row and column totals for the table. b) Do you think that blood type and Rh factor are independent (not related) or dependent (related)? Find a couple conditional probabilities to back up your answer. Hint: Use the probability formula: P A | B P A | C . c) Now use a Chi Squared Independence Test on Statcrunch to test the claim that blood type and Rh factor are dependent (related). Write down all the expected values. Does the problem meet all the assumptions necessary to perform the test? Include the null and alternative hypothesis, the Chi Squared test statistic, p-value, whether or not you reject the null hypothesis and a conclusion. d) Now go back and analyze your answer to #4b. Would you change your mind? 7. Follow-up question: Write a few sentences talking about the difficulty in determining whether there is a “significant” difference between values. Why do we need an independence test? Can’t we just use conditional probabilities to prove independence? Math 140 Hypothesis Tests Activity #15 Homogeneity Tests and Simulation Directions: For numbers 1 and 2, use statkey at www.lock5stat.com to simulate the following chisquared goodness of fit tests. Go to “more advanced randomization tests” at the bottom of the statkey page. Click on the button that says “chi-squared test for association”. Remember a test for association has the same test statistic as a homogeneity test. 1. A random sample of people were asked whether bottled water from various companies or filtered water tastes better. In statkey, you will find this data already entered. If not, look at the top left button. It should say “water taste”. Test the claim that the distributions of percentages that prefer bottled water and filtered water are different depending on the company. What is the degrees of freedom? Give the null and alternative hypothesis. Simulate the null hypothesis. Notice statkey simulates chisquared values from a population that has the same distribution (i.e. as if the companies were the same). Use the right tailed function to check if the real original sample data is significant. What is test statistic? What is the P-value? In the original sample data, were the observed sample values significantly different than the expected values in the null hypothesis? Could the original sample data have happened by random chance (sampling variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for the test. (Use a 5% significance level.) (You can assume that the data meets the assumptions for inference.) 2. Let’s look at another data set in Statkey. Click on the top left button and change the data to “profession by handedness”. This looks at various professions and if the person was left or right handed. Test the claim that the distributions of right and left handed should be the same for the various professions. What is the degrees of freedom? Give the null and alternative hypothesis. Simulate the null hypothesis. Notice statkey simulates chi-squared values from a population that has the same distribution (i.e. as if the professions had the same percentages of right and left handed). Use the right tailed function to check if the real original sample data is significant. What is test statistic? What is the P-value? In the original sample data, were the observed sample values significantly different than the expected values in the null hypothesis? Could the original sample data have happened by random chance (sampling variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for the test. (Use a 5% significance level.) (You can assume that the data meets the assumptions for inference.) Directions: For numbers 3-6, use statcrunch to find the chi-squared test statistic and P-value and complete the hypothesis test. First type the two way table as you see it into statcrunch. You will need the row and column labels also. Now go to the “stat” menu, then “tables”, then “contingency”. Tell statcrunch the columns your counts are in and under “row labels” the column your row labels are in. Be sure to click on expected counts and contribution to chi-squared also. 3. Three random samples were taken of Democrats, Republicans and Independents. Use a Homogeneity test to answer the following question. Does the evidence suggest that the proportion of individuals for or against the legalization of marijuana is the same for each political affiliation? Use a 0.05 significance level. Make sure to check if the data meets the assumptions necessary for a homogeneity test. Include the null and alternative hypothesis, chi-square test statistic, p-value, whether or not you reject the null hypothesis and a detailed conclusion. As a follow up, answer the following two questions: Are the observed values significantly different than the expected values from the null hypothesis? Could the sample data have happened by random chance (sampling variability)? Democrat Republican Independent Legalize Marijuana 240 121 292 Do not Legalize Marijuana 326 370 446 4. A random sample of American adults was taken and their health and education status obtained. Use a Homogeneity Test to test the claim that health is the same regardless of education level. Use a 0.05 significance level. Make sure to check if the data meets the assumptions necessary for a Homogeneity Test. Include the null and alternative hypothesis, chi-square test statistic, p-value, whether or not you reject the null hypothesis and a detailed conclusion. Excellent Health Good Health Fair Health Poor Health Less than High School 72 202 199 62 High School Diploma 465 877 358 108 Some College/Associates Degree 80 138 49 11 Bachelor’s Degree 229 276 64 12 Graduate Degree 130 147 32 2 As a follow up, answer the following two questions: Are the observed values significantly different than the expected values from the null hypothesis? Could the sample data have happened by random chance (sampling variability)? 5. Use a homogeneity test to test the following claim. An obstetrician wants to learn whether the amount of prenatal care is different depending on and how wanted the pregnancy was. He randomly selects 939 women and finds the following information about when they started prenatal care (if ever) and how wanted was the pregnancy. Use a 0.05 significance level. Make sure to check if the data meets the assumptions necessary for a Homogeneity Test. Include the null and alternative hypothesis, chisquare test statistic, p-value, whether or not you reject the null hypothesis and a detailed conclusion. Prenatal care in < 3 months Prenatal care in 3-5 months Prenatal care in >5 months (or never) Intended Pregnancy 593 26 33 Unintended Pregnancy 64 8 11 Mistimed Pregnancy 169 19 16 As a follow up, answer the following two questions: Are the observed values significantly different than the expected values from the null hypothesis? Could the sample data have happened by random chance (sampling variability)? 6. In 2005, a random sample of 1000 chickens sold in grocery stores were tested for presence of salmonella or campylobacter. These are dangerous and can cause illness in people. The study was repeated in 2008. Use a Homogeneity test to determine whether there has been a difference in the proportions of outbreaks in 2005 vs 2008. Use a 0.05 significance level. Make sure to check if the data meets the assumptions necessary for a Homogeneity Test. Include the null and alternative hypothesis, chi-square test statistic, p-value, whether or not you reject the null hypothesis and a detailed conclusion. Salmonella or Campylobacter Present Salmonella or Campylobacter Not Present 2005 86 914 2008 74 926 As a follow up, answer the following two questions: Are the observed values significantly different than the expected values from the null hypothesis? Could the sample data have happened by random chance (sampling variability)? Hypothesis Test Notes Analysis of Variance (ANOVA) Recall that the goodness of fit categorical data test can be used when comparing a percentage in 3 or more groups. What if we have quantitative data from 3 or more groups and want to compare the mean averages? The answer to this is the ANOVA test. ANOVA stands for “Analysis of Variance” Is a favorite of statisticians because it is very versatile and can be used for comparing the means of quantitative data sets. ANOVA Null and Alternative Hypothesis The “one-way” ANOVA hypothesis test is used to compare 1 mean average between several groups. If you want to compare more than one mean from several groups, that is called a “Two-way ANOVA”. (We will only cover one-way ANOVA) Example 1-Mean Average Salaries for people living in five states in Australia. Suppose we want to compare the mean average weekly salary for people living in 5 states in Australia. (Northern Territory, New South Wales, Queensland, Victoria, and Tasmania) We think they are different. As with all multiple population hypothesis tests, you should label the populations. 1 : Northern Territory 2 : New South Wales 3 : Queensland 4 : Victoria 5 : Tasmania Here is the null and alternative hypothesis for the ANOVA test. Remember an ANOVA is a multiple test for 3 or more groups. H 0 : 1 2 3 4 5 H A : at least one is (claim) When doing an ANOVA test, it is good to find the sample size (n), the sample mean of each group, and the standard deviation or variance for each group. Here is the sample data from StatCrunch. Summary statistics: Column n Mean Std. dev. Variance North Territory 35 1534.5395 701.52474 492136.96 New South Wales 35 1536.8228 677.14095 458519.87 Queensland 35 1368.2912 536.31969 287638.81 Victoria 35 1149.0504 516.55309 266827.09 Tasmania 35 898.69512 386.35397 149269.39 Note: The key question: Are these sample means different because of sampling variability (random chance) OR are they different because at least one of the populations really is different? To answer this, we need a test statistic and a P-value. ANOVA Test Statistic – F distribution The T test statistic can only be used to compare two things, either a sample mean to a population mean or the mean averages from two populations. Either way, the T-test statistic cannot handle 3 or more groups. F-test statistic to the rescue The F-test statistic uses variance to measure how different the sample means are. It relies on two specific variances. Remember Variance (standard deviation squared) is a measure of spread that determines how far values are from the mean. Variance between the groups (How far the sample means for each group are from the overall mean of all the groups combined) Variance within the groups (How far each sample value is from its own sample mean.) The F-test statistic F Variance between the groups Variance within the groups F-test statistic sentence: The ratio of the variance between the groups to the variance within the groups. Now let’s watch the 3 ANOVA videos on Kahn Academy to see how the F is calculated. Notes about the F-test statistic In a fraction, when the numerator is significantly larger than the denominator, the overall fraction is large. So if the variance between the groups is much larger than the variance within the groups, this will give a large F-test statistic (small P-value) and indicates that the sample means are significantly different. (Unlikely to happen by random chance, reject the null hypothesis) In a fraction, when the numerator is the same of smaller than the denominator, the overall fraction is small. So if the variance between the groups is much smaller than the variance within the groups, this will give a small F-test statistic (large P-value) and indicates that the sample means are not significantly different. (Could have happen by random chance, fail to reject the null hypothesis) There are three degrees of freedom in an ANOVA: df within the groups, df between the groups, and df total. Here are the degrees of freedom in our last example of Australia weekly salaries df between = # groups – 1 = 5 – 1 = 4 total df = total # of data values from all groups combined – 1 = 35x5 – 1 = 175 – 1 = 174 df within each group = (35 – 1) + (35 – 1) + (35 – 1) +(35 – 1) +(35 – 1) = 34 + 34 + 34 + 34 + 34 = 170 Notice df between (4) + df within (170) = df total (174) Do not calculate the F test statistic by hand with a calculator. It is a really difficult calculation. Calculate the F-test statistic with a computer software like StatCrunch or StatKey. Like Chi-Squared, the F-distribution is always skewed right and the ANOVA test is always a right tailed test. How to do an ANOVA test with StatCrunch Copy and Paste your raw quantitative data from each group into some columns of StatCrunch. To check assumptions, you will want to create histograms of all your data sets to check shape. Go to “graph” menu and click on “histogram”. (Or you can check the dot plot option in the oneway ANOVA menu.) A side by side boxplot is also a nice summary of center and spread. To calculate the F-test statistic and P-value, Go to the “stat” menu, then “ANOVA”, then “one way”. Stat ANOVA One Way Hold the control key down to select the columns where your data is and push compute. Here is the printout we got. Analysis of Variance results: Data stored in separate columns. Column statistics Column n North Territory Mean Std. Dev. Std. Error 35 1534.5395 701.52474 118.57932 New South Wales 35 1536.8228 677.14095 114.45771 Queensland 35 1368.2912 536.31969 90.654574 Victoria 35 1149.0504 516.55309 87.313408 Tasmania 35 898.69512 386.35397 65.305741 ANOVA table Source DF SS MS F-Stat P-value Columns 4 10484499 2621124.8 7.9217156 <0.0001 Error 170 56249332 330878.43 Total 174 66733832 Let’s see if we understand what we are seeing. Notice the MS (mean sum of squares) is the sum of squares (SS) divided by degrees of freedom (df). MS (columns) is the variance between the groups (2621124.8) MS (Error) is the variance within the groups (330878.43) So the F-test statistic is calculated by the formula F Variance between the groups 2621124.8 7.9217 Variance within the groups 330878.43 So the variance between the groups is almost 8 times greater than the variance within the groups. Is this significantly large for an F? Again, when unsure about a test statistic refer to a simulation or the P-value. If the sample data is in the tail of the simulation or if the P-value is close to zero it is significant. Notice in our printout from StatCrunch we got the following P-value: “<0.0001”. This means the actual P-value is very close to 0. P-value 0 From our study of P-values, we know this is very significant. So the F test statistic is significantly large and the variance between the groups is significantly greater than the variance within the groups. This is highly unlikely to happen by random chance. Reject the null hypothesis H 0 Conclusion? Recall the claim was that at least one state was different than the others (alternative hypothesis). Since we rejected the null, we support this claim. Our P-value is very small and our test statistic very large, so we have significant sample evidence. Conclusion: There is significant sample evidence to support the claim that the mean average salaries of people in Northern Territory, New South Wales, Queensland, Victoria, and Tasmania are different. Assumptions for ANOVA Random Sample sizes at least 30 or bell shaped Groups Independent of each other Equal population variances (no group has a standard deviation more than twice as big as any other group) Check the assumptions for the Australian states problem. Random? The data was a random sample of people living in these states in Australia. 30 or bell shaped? Some histograms were bell shaped and some a little skewed, but since the sample sizes were 35 (over 30), it does pass this assumption. Independent groups? The data may fail this one. Salaries from state to state all may be related due to their reliance on the overall economy and unemployment rates of Australia. Equal Population Variances? Looking at the standard deviations listed in the ANOVA printout. The larges standard deviation is 701.5 and the smallest was 386.4, so no standard deviation was more than twice as large as any other. So it passes this assumption. Simulating the F-distribution We can also determine if the F-test statistic is sufficiently large by simulation. Go to www.lock5stat.com and click on “StatKey”. Click on “ANOVA for difference in means”. In the top right corner, change the “ants” problem to “Fish Gill, Gill rates by Calcium”. This is exploring if the amount of calcium is related to how well the gills of a fish work. Here is the null and alternative hypothesis. Remember when the sample means are equal that means calcium is not related to how well a fish’s gills function. When sample means are different, this means that calcium is related. H 0 : 1 2 3 H A : at least one is (claim) Notice the F-test statistic has already been calculated. Let’s see if the F = 4.648 is significant by simulating F test statistic assuming the null hypothesis is true and the population means are equal. 2000 simulations gave the following. Notice we are looking for the probability that the sample data (original F test statistic) or more extreme happened by random chance. This is the P-value. Notice the original sample F test statistic did fall in the tail. It also has a very small P-value (0.012). Both of these things tell us that the F test statistic of 4.648 was significant. Reject the null hypothesis. There is sufficient sample evidence to support the claim that calcium is related to the function of a fish’s gills (the population means are different). Math 140 Hypothesis Test Act 16 Exploring ANOVA and the F-distribution through Simulation ANOVA stands for Analysis of Variance. It is a popular test among statisticians. A one-way ANOVA hypothesis test is like a multiple mean average test (usually 3 or more groups). Here is a sample null and alternative hypothesis. H 0 : 1 2 3 4 5 H A : at least one population mean is different ANOVA uses a very ingenious test statistic that looks at the ratio between the variance between the groups to the variance within the groups. We call this ratio the “F-distribution”. Recall that variance is the square of the standard deviation and is a measure of spread from the mean. ANOVA Test Statistic F Variance Between the groups Variance Within the groups So the main idea is this: If the variance between the groups is very large compared to the variance within the groups then the population means are probably not the same, and the F ratio will be rather large. This will result in a small P-value and rejecting the null hypothesis. If the variance between the groups is small compared to the variance within the groups then the population means might be the same, and the F ratio will be rather small. This will result in a large P-value and we will fail to reject the null hypothesis. Here are the key questions: How do I know if the F –test statistic is large enough to be considered significant? (Is the variance between significantly greater than the variance within the groups?) Is it likely or unlikely that the sample data occurred by random chance from equal populations, or does this give evidence that the population means really are different? These questions and more can be answered by studying simulation. Specifically we are going to simulate random data from groups with equal population means and compare the original sample data to the simulation. Note: ANOVA does have several assumptions that we check, but for this activity you can assume the assumptions are met. We will focus on understanding the ANOVA test and the F-test statistic through simulation. Go to www.lock5stat.com and click on the StatKey button. Under the “More advanced randomization tests” menu click on “ANOVA for Difference in Means”. 1. The first data we are going to look at is the “Sandwich Ants” data. It should be entered, but if not, you can click on the button at the top left of the page and click on “Sandwich Ants”. We are studying the number of ants that are drawn to different kinds of food. In this data, we are looking at the mean average number of ants that come to three different types of sandwiches. Test the claim that the number of ants will be different depending on which sandwich is left out. a) Write the null and alternative hypotheses for the test. Which is the claim? b) What is the F-test statistic for the sample data? Does it look significant? Let’s find out by simulating. c) Simulate random F test stats from populations with equal means. What shape is the simulated distribution? d) Estimate the P-value from the simulation? Write a sentence to explain the P-value. e) Is it likely or unlikely that the original sample F test statistic happened by random chance? f) Do you think the variance between the groups is significantly greater than the variance within the groups? Why? g) Will you reject or fail to reject the null hypothesis? h) Write a conclusion for the correlation hypothesis test. 2. Now click on the Pulse rate and award data. This data looks at the average pulse rates of those people that have won Olympic, Academy and Nobel awards. Test the claim that the population mean average pulse rate is the same for the three groups. a) Write the null and alternative hypotheses for the test. Which is the claim? b) What is the F-test statistic for the sample data? Does it look significant? Let’s find out by simulating. c) Simulate random F test stats from populations with equal means. What shape is the simulated distribution? d) Estimate the P-value from the simulation? Write a sentence to explain the P-value. e) Is it likely or unlikely that the original sample F test statistic happened by random chance? f) Do you think the variance between the groups is significantly greater than the variance within the groups? Why? g) Will you reject or fail to reject the null hypothesis? h) Write a conclusion for the correlation hypothesis test. 3. Now click on the Homes for Sale (price by state) data. This data looks at the average selling price of homes in four different states. Test the claim that the population mean average home price is different in the various states. a) Write the null and alternative hypotheses for the test. Which is the claim? b) What is the F-test statistic for the sample data? Does it look significant? Let’s find out by simulating. c) Simulate random F test stats from populations with equal means. What shape is the simulated distribution? d) Estimate the P-value from the simulation? Write a sentence to explain the P-value. e) Is it likely or unlikely that the original sample F test statistic happened by random chance? f) Do you think the variance between the groups is significantly greater than the variance within the groups? Why? g) Will you reject or fail to reject the null hypothesis? h) Write a conclusion for the correlation hypothesis test. Math 140 Hypothesis Test Activity 17 Analysis of Variance (ANOVA) with Statcrunch One-Way ANOVA test assumptions Random samples of three or more groups, each measuring the same quantitative variable. Groups are independent of each other. Check if all the samples are at least 30 or nearly normal. Populations have same variance. Check if the largest sample standard deviation is less than or equal to twice the smallest standard deviation. ANOVA Test Statistic F Variance Between the groups Variance Within the groups Directions: Copy and paste the ANOVA test data sets into Statcrunch and answer the following questions. 1. Random samples of black bears were weighed at various times of the year. Some of the bears were weighed in April through July. Others were weighed in August and September or October and November. a) Create a side by side box plot for the three data sets. Draw the boxplot below and describe the graph. Also find the mean average, standard deviation and variance for each of the three data sets. How do they compare. b) Does the data meet the assumptions necessary to do an ANOVA test? (If the data set is small, be sure to check the nearly normal assumption by making a histogram.) c) Use an ANOVA test to test the claim that the average weight of black bears is different depending on what time of year they are measured. (Can you think of a reason why this might be true?) Give the null and alternative hypothesis, the F-test statistic and the P-value. Write a sentence describing the meaning of the F-test statistic and another sentence describing the meaning of the P-value. Give the degrees of freedom between the groups and the degrees of freedom within the groups. Did you reject the null hypothesis or fail to reject? Write a conclusion for your test. d) Follow up questions: i.) Was the variance between the groups significantly higher than the variance within the groups? (Explain how you know.) ii.) Is it likely or unlikely that the sample data occured by random chance from groups that have the same mean average? (Explain how you know.) 2. Now we are going to look at the relationships between how much sleep Math 075 students get and how many units they have completed at COC. Since the Math 075 data is census data we can assume it is representative of the population of all 075 students. Julie thinks that the average number of units will be the same no matter how much sleep the person gets. Analyze the data for Julie. The number of units have been broken up into three data sets (less than 6 hours, 6-8 hours, more than 8 hours). a) Create a side by side box plot for the three data sets. Draw the boxplot below and describe the graph. Also find the mean average and variance for each of the three data sets. How do they compare. b) Does the data meet the asumptions necessary to do an ANOVA test? (If the data set is small, be sure to check the nearly normal assumption by making a histogram.) c) Use an ANOVA test to test the claim that the average number of units completed at COC is the same regardless of how much sleep a person gets. Give the null and alternative hypothesis, the F-test statistic and the P-value. Write a sentence describing the meaning of the F-test statistic and another sentence describing the meaning of the P-value. Give the degrees of freedom between the groups and the degrees of freedom within the groups. Did you reject the null hypothesis or fail to reject? Write a conclusion for your test. d) Follow up questions: i.) Was the variance between the groups significantly higher than the variance within the groups? (Explain how you know.) ii.) Is it likely or unlikely that the sample data occured by random chance from groups that have the same mean average? (Explain how you know.) 3. Let’s look again at the Math 075 data and explore the relationship between political party and how much alcohol someone drinks. Since the Math 075 data is census data we can assume it is representative of the population of all 075 students. The amount of alcohol drunk has been separated into four data sets corresponding to four political affiliations (democrat, republican, independent, other). Is the average amount of alcohol drunk different depending on political affiliation? a) Create a side by side box plot for the three data sets. Draw the boxplot below and describe the graph. Also find the mean average and variance for each of the three data sets. How do they compare. b) Does the data meet the asumptions necessary to do an ANOVA test? (If the data set is small, be sure to check the nearly normal assumption by making a histogram.) c) Use an ANOVA test to test the claim that the average amount of alcohol a person drinks differs depending on political affiliation. Give the null and alternative hypothesis, the F-test statistic and the P-value. Write a sentence describing the meaning of the F-test statistic and another sentence describing the meaning of the P-value. Give the degrees of freedom between the groups and the degrees of freedom within the groups. Did you reject the null hypothesis or fail to reject? Write a conclusion for your test. d) Follow up questions: i.) Was the variance between the groups significantly higher than the variance within the groups? (Explain how you know.) ii.) Is it likely or unlikely that the sample data occured by random chance from groups that have the same mean average? (Explain how you know.) 4. Let’s look at the math 140 survey from fall 2015. We are analyzing the amount of minutes per week spent on social media. We separated the data into the type of social media being used. Test the claim that the amount of minutes spent is the same no matter what social media was being used. a. Create a side by side box plot for the three data sets. Draw the boxplot below and describe the graph. Also find the mean average and variance for each of the three data sets. How do they compare. b. Does the data meet the asumptions necessary to do an ANOVA test? (If the data set is small, be sure to check the nearly normal assumption by making a histogram.) c. Use an ANOVA test to test the claim that the amount of minutes spent is the same no matter what social media was being used. Give the null and alternative hypothesis, the F-test statistic and the P-value. Write a sentence describing the meaning of the F-test statistic and another sentence describing the meaning of the P-value. Give the degrees of freedom between the groups and the degrees of freedom within the groups. Did you reject the null hypothesis or fail to reject? Write a conclusion for your test. d. Follow up questions: i.) Was the variance between the groups significantly higher than the variance within the groups? (Explain how you know.) ii.) Is it likely or unlikely that the sample data occured by random chance from groups that have the same mean average? (Explain how you know.) Correlation Hypothesis Test Notes Sample Correlation Coefficient: r Population Correlation Coefficient (rho): Two variables have correlation if (rho) is close to +1 or -1. ( r is significantly different than zero) Two variables have positive correlation if (rho) is close to +1. ( r is significantly greater than zero) Two variables have negative correlation if (rho) is close to -1. ( r is significantly less than zero) Two variables do not have correlation if (rho) is close to zero. ( r is significantly close to zero) Correlation Hypothesis Test Example X : Amount of Tar (mg) Y : Amount of CO (ppm) Test the claim that there is a positive correlation between tar and Carbon Monoxide. Ha: 0 (or Rho > 0) (claim) (Is positive correlation) Ho: 0 (or Rho = 0) (no correlation) R = 0.9335 P-value < 0.0001 Assumptions: Random, 2 quantitative data sets, Scatterplot shows a linear trend (points are close to the line) and there are no influential outliers. The histogram showed a nearly normal distribution. The histogram was centered close to zero. The residual plot showed a slight fan shape. So it fails the homoscedasticity requirement. My sample size was less than 30 (29) but the data was bell shaped so it passes the 30 or normal requirement. P-value < 0.0001 Sentence: If Ho is true (no correlation) then there is less than 0.0001 chance of getting the sample data or more extreme. Could this data happen by random chance from a population with no correlation? Very unlikely (0.0001) Is the r value of 0.9335 significantly different than zero? Yes. There is a significant difference (low p-value) Reject Ho There is significant sample evidence to support the claim that there is a positive correlation between the amount of tar in a cigarette and the amount of carbon monoxide. Note about the null and alternative hypothesis of a correlation test You may see the null and alternative hypothesis written differently in various books and programs. Look at the following formula: sy slope r sx When there is no correlation, the r value gets close to zero. The standard deviations of the x and y variables are both positive numbers. So as the correlation coefficient r gets close to zero, so does the slope of the regression line. Three ways of writing correlation null and alternative hypothesis. These are all equivalent statements. H 0 : No Correlation H A : Is Correlation OR H0 : 0 HA : 0 OR H 0 : Slope 0 H A : Slope 0 Similarly, there are three ways of writing positive correlation null and alternative hypothesis (right tail). These are also equivalent statements. H 0 : No Correlation H A : Is Positive Correlation OR H0: 0 HA : > 0 OR H 0 : Slope 0 H A : Slope > 0 Similarly, there are three ways of writing negative correlation null and alternative hypothesis (left tail). These are also equivalent statements. H0 : No Correlation H A : Is Negative Correlation OR H0: 0 HA : < 0 OR H 0 : Slope 0 H A : Slope < 0 Note about test statistics used in a correlation test There are three different test statistics that can be used in a correlation test. A StatCrunch printout will show you all three. The important thing to remember is that all of them give virtually the same P-value. The sample correlation coefficient (r) can be used. This works especially well with simulation. You can use simulation to see if r was significant and to calculate an approximate P-value. Like a two population mean hypothesis test, you can use a T-test statistic. However, the t will not be measuring how many standard errors one sample is from another. Remember, in correlation the x and y have different units. You cannot directly compare them. Instead the T test statistic will measure how many standard errors that the slope of the regression line is from zero. Recall that when there is no correlation, the slope of the regression line goes to zero. Some statisticians also use an F-test Statistic like an ANOVA test, though ANOVA is designed for comparing means from 3 or more groups. Note: The P-value for a y-intercept hypothesis test is very different than the P-value for a correlation test. Do not use the y-intercept P-value on StatCrunch. That is for testing a y-intercept or initial value. Hypothesis Test Activity 18 Understanding Correlation through Simulation Introduction: We learned yesterday that the correlation coefficient “r” can be used to determine if there is a correlation (linear relationship) between two quantitative variables. Today we are wondering if there is a correlation between two samples, is that correlation coefficient significant enough to show that there is or is not correlation between populations (i.e. to perform a hypothesis test). We learned that if r is close to +1, there is strong positive correlation between the samples. If r is close to -1, there is strong negative correlation between the samples. When r is close to 0, that indicates that there is no correlation between the samples. This can help us understand the samples, but how do we know how close the r value has to be to 1 or -1 to be considered significant? Also is it significant enough to perform a correlation hypothesis test (rho test)? These are important questions to answer through simulation and understanding of r and P-value. Here are some possible null and alternative hypotheses for a correlation rho test. The Greek letter “rho” is often used for the population correlation coefficient. Remember “r” is a sample correlation coefficient. H 0 : rho 0 (no correlation) H 0 : rho 0 (no correlation) H A : rho 0 (is correlation) H A : rho 0 (is positive correlation) H 0 : rho 0 (no correlation) H A : rho 0 (is negative correlation) We are going to use Statkey on www.lock5stat.com to simulate random correlation data. That way we can compare the correlation coefficient (r) of the original sample data to the simulated r values in the simulation. Through this we can tell if the r from the original sample was significant and we can also estimate the P-value and determine if the sample correlation coefficient r could of occurred by random sampling variability (random chance) or there really is correlation between the two populations. Go to www.lock5stat.com and click on the “Statkey” button. Under “Randomization Hypothesis Tests” click the one that says “Test for Slope, Correlation”. Make sure the top of the graph says “Randomization Dotplot of “correlation”. Notice the null hypothesis is rho = 0. Remember rho looks like a “ ” but it is not a “P”. Normally we will of course be checking assumptions, but for this activity we will just be focused on understanding the simulation. You can assume the assumptions are met. 1. Let’s look at the Uniform Violence data from the NFL. This should be already entered, but if not you can find it in the top left button. It should say “Malevolent Uniforms”. They looked at NFL team uniforms and measured how scary their uniforms are on a scale of about 2 to 5. They then looked at how much those teams are penalized. Is there a positive correlation between having a scarier uniform and being more prone to penalties during the games? a) Write the null and alternative hypotheses for the test. Which is the claim? Is this a right tail, left tail or two tailed test? b) What is the r value for the sample data? Do you think the r-value will be significant enough to show correlation between populations? c) Perform the simulation and estimate the P-value from the simulation? Write a sentence to explain the P-value. Is it likely or unlikely that the original sample r-value happened by random chance? d) Will you reject or fail to reject the null hypothesis? e) Write a conclusion for the correlation hypothesis test. 2. Is there a negative correlation between the pH (acidity) of Florida lakes and the amount of Mercury in the lake? This is the question we are striving to answer in the next simulation. On the top left button, click on “Florida Lakes” data. The data was part of a study about dangerous Mercury levels in many of the Florida lakes. a) Write the null and alternative hypotheses for the test. Which is the claim? Is this a right tail, left tail or two tailed test? b) What is the r value for the sample data? Do you think this value will be significant enough to show correlation between populations? c) Perform the simulation and estimate the P-value from the simulation? Write a sentence to explain the P-value. Is it likely or unlikely that the original sample r-value happened by random chance? d) Will you reject or fail to reject the null hypothesis? e) Write a conclusion for the correlation hypothesis test. 3. In the top left button in Statkey, change to “ICU Admission” data. Test the claim that there is no correlation between Systolic Blood pressure and heart rate. a) Write the null and alternative hypotheses for the test. Which is the claim? Is this a right tail, left tail or two tailed test? b) What is the r value for the sample data? Do you think this value will be significant enough to show correlation between populations? c) Perform the simulation and estimate the P-value from the simulation? Write a sentence to explain the P-value. Is it likely or unlikely that the original sample r-value happened by random chance? d) Will you reject or fail to reject the null hypothesis? e) Write a conclusion for the correlation hypothesis test. Math 140 Hypothesis Test Activity 19 Correlation Hypothesis Test with Statcrunch Directions: For each of the following problems, find if there is a linear relationship (correlation) between the quantitative variables by performing a correlation rho test. For each of the following data sets, decide which data set should be the explanatory variable and which should be the response variable. Go to the “Stat” menu, and click on “Regression”, then “Simple Linear”. Put in the columns for the explanatory (x) and the response (y). Click on Fitted line plot, Residuals verses x variable, and a Histogram of the residuals. You should get the scatterplot, Residuals verses x variable, and a Histogram of the residuals on a word document with the r value, r-squared, standard deviation of the residuals, and the equation of the regression line. You do not have to save the graphs in a word document, but you will need to look at the graphs when you check assumption. Be sure to check for assumptions. You may assume the data has been collected randomly. Be sure to give the null and alternative hypothesis, the correlation coefficient, and the P-value. Also state whether or not you reject the null hypothesis and give a conclusion. Assumptions for Correlation Rho Test Two quantitative variables Random ordered paired data with sample sizes at least 30. Scatterplot shows a linear shape (Points in scatterplot follow a general linear pattern.) There are NO influential outliers in the scatterplot. Histogram of the residuals is nearly normal (close to bell shaped). Histogram of the residuals is centered at zero. Points in the residual plot are evenly spread out from the regression line. (Homoscedasticity) (No fan shape in the residual plot – residuals verses x values) 1. Open the women’s health data. Test the claim that there is a positive linear correlation between the height of a woman and her weight? 2. Open the women’s health data. Test the claim that there is no linear correlation between the diastolic blood pressure of a woman and her systolic blood pressure? 3. Open the women’s health data. Maria claims that there is a linear correlation between the age of a woman and her cholesterol. Test Maria’s claim. 4. Open the Bear data. A park ranger claims that there is not a linear correlation between the chest size of a bear and the width of its skull. Test this claim. 5. Open the Bear data. A park ranger claims that there is a positive linear correlation between the neck size of a bear and its weight. Test this claim. 6. Open the Bear Data. A small boy named Joe went to the zoo. When looking at the bears, he claims that you will not be able to show there is a correlation between the age of a bear and the length of a bear because the bear data does not meet the assumptions necessary to do a linear correlation hypothesis test. (Joe is a really smart kid.) A zoo keeper disagrees with him. Do you agree with Joe or the zoo keeper? Why?