Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 5 Hypothesis Testing Recall that Statistics, the science of analyzing data, has two main branches – descriptive and inferential statistics. In descriptive statistics we describe data or tell a story hidden in the data. In inferential statistics, we infer a story about a population, based on data obtained for a sample. In Inferential statistics, one of the things we do is estimate the values of certain population parameters, something we learnt in the previous chapter. The other main thing we do in inferential statistics is hypothesis testing, the subject of this chapter. Although hypothesis testing may sound very technical, it is something you and I do every day. In fact, it is part of your survival mechanism on a moment by moment basis. You are constantly engaged in hypothesis testing. In order to survive in your environment, you need to make sure that your environment is conducive to your survival and safety. If it deviates from being conducive to your survival and safety, your senses will alert you to it and you will do something about it. For example, let’s say you are working on your computer and suddenly you hear a loud explosion, an extraordinarily loud explosion, your senses will alert you to it and you will leave whatever you were doing and would want to know what happened, what caused the explosion and whether you need to take any action to ensure your safety. For example, your mind might entertain all kinds of ideas such as – can a fire break out, has some country declared war etc. And if fire can break out, can it affect your building, do you need to evacuate, do you need to call emergency or make sure someone has already called emergency etc. In other words, whenever a rare event happens, you take notice and do something about it especially if the rare event affects you in some way. In this example, on a moment to moment basis you assume that your environment is “normal” until one of your senses “alerts” you to a “sufficient deviation from normality” which causes you to take notice and perhaps even some action particularly if the deviation from normality affects you in some way. This is an example of hypothesis testing that you are constantly engaged in. Your default hypothesis (or the null hypothesis) is that your environment is “normal”. Your alternative hypothesis is that it isn’t. Your senses are constantly sending you signals. The moment a signal suggests sufficient evidence to believe that an abnormal condition exists, you reject the null hypothesis that all is normal. Let me give you another example, a softer example, one that does not involve explosions. Suppose you are having a nice dinner conversation with your friends and one of your friends tells you that she saw a man almost 8 ft. tall the other day. I am sure you, at least for a moment, stop eating, gasp, and give your friend an amazed look, as if she has accomplished a very rare feat. Since it is very rare for someone to be 8 ft. tall to begin with and it is even more rare that such a person happens to be around your town and it is even more rare for your friend to run into this person, that hearing about such a rare event draws an unusual reaction from you. In fact, you must have noticed, that at parties, during conversations, people are trying to impress others by telling them things that are out of the ordinary. Sometimes, there is almost like a competition going on as to who can narrate a more extraordinary true incident that will cause everyone to gasp and make you feel important. After all, if your friend had told you that she saw someone who was 5’ 8”, not only would you not gasp, you might wonder what was wrong with your friend – why is she wasting your time giving such mundane piece of information. In this example, your null hypothesis is that everyone has a normal height of around 5’ 8” and a standard deviation of around 3” and that most of the adult population falls somewhere between 4’ 6” and 7 ft. Some basketball players go beyond 7 ft. The alternate hypothesis is that someone does not belong in this normal range of height. Any news or evidence of someone being outside of this normal expected range rejects your null hypothesis and it invokes a reaction and possibly some action from you. 1 As another example, in the court of law, in most countries, everyone is considered “innocent” unless proven “guilty”. In order to be proven the guilt, there should be sufficient “evidence” establishing the guilt, for example if the defendant’s fingerprints might have been found on the murder weapon. It is harder to prove or establish “innocence”. Attorneys often try to establish innocence by establishing alibis, but we all know that alibis can also be designed and many criminals do design alibis in order to establish innocence in case they are convicted of crime. In this example, the null hypothesis (or the default hypothesis) is that the defendant is innocent and the alternative hypothesis is that the defendant is guilty. When sufficient evidence of guilt is found, the null hypothesis is rejected. Till then the null hypothesis is not rejected. As another example, suppose you go to a doctor with some symptoms that in your mind are sufficiently severe to raise a flag in your mind that something is wrong with you. The doctor will assume that you are a normal and healthy person. But since you are complaining, he or she will look for scientific evidence that you really need medical attention. So the doctor will order some tests that will try to establish sufficient evidence that you indeed carry an abnormal condition. Without any evidence to the contrary you are assumed to be normal. In this example, the null hypothesis is that you are healthy and the alternative is that you are not and you need to have sufficient evidence to reject the null hypothesis. This scientific process of testing for hypotheses has been used for hundreds of years by scientists, criminologists, doctors, sociologists, statisticians, and really by everyone including you, on a day to day basis. Students find the subject of hypothesis testing very difficult to understand. And perhaps it was very difficult during pre-Excel days. But these days, there is no reason to find testing of hypothesis a difficult topic. Basically the line of reasoning is this - if the probability of some event is so small that it is unlikely that the assumption of “all is normal” can be true then you reject that assumption and go with the alternate hypothesis that “all is not normal”. In the explosion example, the probability of a loud explosion is so small when all is normal, that you had to admit and take notice that something was not “normal”. In the dinner table conversation, the probability of someone’s height being 8 ft. was so small that you had to take notice that something was not “normal”. In the criminal case example the probability of finding the defendant’s fingerprints on the murder weapon is so small in case the defendant was indeed innocent, that you want to reject the null hypothesis that he (or she) is innocent. In the medical example, if the x-rays reveal a large ulcer in your stomach, then it might be sufficient evidence that not all is normal. So basically, one has to establish the probability of the “rare” event under the assumption that all is normal. The probability of the rare event is called the p-value. If the pvalue is very small (say less than 5% or less than 1%), then you may say that you have sufficient evidence against the null. So remember this – a small p-value means a small probability (because p stands for probability) and small probabilities imply rare events or “not normal” events and therefore a low p-value is evidence against the null hypothesis because the null hypothesis assumes that all is normal. If you cannot understand the line of reasoning explained in the last two pages, then you will have a lot of difficulty understanding the material in the rest of this course. So if you do not understand the last two pages, please go back and read them again till you understand them. Please note that in all the discussion in this chapter so far, the word “normal” has nothing to do with the “normal probability distribution”. I am using the term normal as in nothing abnormal or nothing out of the ordinary. In hypothesis testing, we deal with two kinds of hypotheses – the null hypothesis and the alternate hypothesis. The null hypothesis is usually represented by the symbol H0 or H0. The alternate hypothesis is denoted by H1 or Ha or H1 or Ha. 2 Hypotheses are usually about a population parameter and we use a sample statistic to test a hypothesis. The reason for this is very simple. If all of the data about a population was available, there would be no need to hypothesize about the value of its parameters, because you can compute it exactly. Since data about a population is not usually available, we need to hypothesize about the possible value of its parameters. For example, we may make a null hypothesis that the average height of all the people in your population is 68”. In symbols, we can write this as follows: H0 : µ = 68”. µ is a Greek symbol usually used for a population average. The alternative hypothesis would be that the average height of all the people in your population is greater than 68”. In symbols, we can write the alternate hypothesis as Ha: µ > 68”. You can also have an alternate hypothesis like Ha: µ < 68” or Ha: µ ≠ 68”. So how do we test this set of null and alternate hypotheses? We collect some data and obtain the mean of the sample (since the hypothesis is about the population mean). If the sample mean is close to 68”, such as 68.5”, we may not have sufficient evidence against the null hypothesis and we will therefore not reject the null hypothesis. After all if the mean of the population is 68” then there is a reasonably high probability of obtaining a sample whose mean happens to be 68.5”. If the sample mean, however, is very different from 68”, such as 74”, the probability of which is very small if the null hypothesis were true, that it must be that the null hypothesis is not true and your decision would be to reject the null hypothesis. Remember this is the same line of reasoning that we used earlier. So the question is - what is the probability of obtaining a sample mean of >74” if the mean height was indeed 68”, i.e. if the null hypothesis were true? The answer to this question is the p-value. If the p-value is small enough, then we can reject the null hypothesis. So how do we compute this p-value? Recall that in an earlier chapter we learnt how to compute probabilities about different range of values of a normal random variable if the probability distribution of that random variable is known. So for example, if we knew the mean and the standard deviation of a normal random variable then we can find the probability of any range of values for that variable. As a refresher, in Figure 1, I reproduce the first few rows of the standard normal table. z -3.00 -2.95 -2.90 -2.85 -2.80 -2.75 -2.70 -2.65 -2.60 -2.55 -2.50 -2.45 Cum Prob. 0.00135 0.00159 0.00187 0.00219 0.00256 0.00298 0.00347 0.00402 0.00466 0.00539 0.00621 0.00714 Cum Cum z Prob. z Prob. -1.50 0.06681 0.00 0.50000 -1.45 0.07353 0.05 0.51994 -1.40 0.08076 0.10 0.53983 -1.35 0.08851 0.15 0.55962 -1.30 0.09680 0.20 0.57926 -1.25 0.10565 0.25 0.59871 -1.20 0.11507 0.30 0.61791 -1.15 0.12507 0.35 0.63683 -1.10 0.13567 0.40 0.65542 -1.05 0.14686 0.45 0.67364 -1.00 0.15866 0.50 0.69146 -0.95 0.17106 0.55 0.70884 Figure 1: Partial Standard Normal Table z 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 2.05 Cum Prob. 0.93319 0.93943 0.94520 0.95053 0.95543 0.95994 0.96407 0.96784 0.97128 0.97441 0.97725 0.97982 Using this table, we can answer questions like what is P(z < 2.0)? or what is P(z > 2.0)? I hope you remember how to do it. P(z < 2.0) is 0.97725 and P ( z > 2.0) is 1 minus 0.97725 = 0.02275 Similarly P(z > 1.65) = 1 minus 0.95053 or 0.04947 3 Also recall that you can use the standard normal distribution to answer these types of questions for any normal random variable as long as you know the mean and the standard deviation of the distribution. You do this by standardizing the value of the normal random variable. As a refresher, let us say the normal variable about which I need some probabilities is called X and it has a normal distribution with a mean of 40 and a standard deviation of 2. What is P(X > 43)? To standardize X to z, we do (43 – 40)/2 = 1.5. So P(X > 43) is the same as P(z > 1.5) which is 1 minus 0.93319 = 0.06681. So how to test a hypothesis such as: H0 : µ = 68” and Ha: µ >68”? Basically we want to get a sample of data and find the sample mean. Call the sample mean x-bar Then find P(mean > x-bar). This probability is called the p-value. If this value p-value is very small, say smaller than 0.05 then we say we have sufficient evidence to reject the null hypothesis at a significance level of 0.05. If this probability is greater than 0.05 then we fail to reject the null hypothesis at a significance level of 0.05. Let’s say we collect some sample data of say size 36 and let’s say we find that the sample mean (or x-bar) comes out to be 68.9” and the standard deviation comes out to be 3 inches. The question is what is P(mean > 68.9”) if the null hypothesis were true? To answer this question, we need the probability distribution of the mean if the null hypothesis were true. Now I will tell you that the probability distribution of the mean if the null hypothesis were true has a normal distribution whose mean is 68” and the standard deviation is 3/sqrt(36) = 3/6 = 0.5. So the question is what is P(mean > 68.9”) if the mean has a normal distribution with a mean of 68” and a standard deviation of 0.5”. We normalize 68.9” as (68.9 – 68)/0.5 = 0.9/0.5 = 1.8 So P(mean > 68.9) is the same as P(z > 1.80) which is 1 minus 0.96407 = 0.03593. Since this probability is less than 0.05, we have sufficient evidence to reject the null hypothesis at a significance level of 0.05. Note that we can reject the null hypothesis at a significance level of 0.04 or even at 0.036 or any value greater than 0.03593 but we could not reject the null hypothesis at a significance level of 0.01 or 0.02 or 0.03 or any value less than 0.03593. Let’s say instead of 68.9, our sample mean came out to be 68.8”. Now we need P(mean > 68.8”). The normalized value of 68.8” is 0.8/0.5 = 1.6. so P(mean > 68.8”) = P(z > 1.6) = 1 minus 9.94520 = 0.0548, which is greater than 0.05. So we say that we did not find sufficient evidence to reject the null at a significance level of 0.05. But we can reject the null at a significance level of 0.06 or 0.07 and any value higher than 0.0548. What is the significance of the significance level? Lower the significance level, more the confidence with which I can reject the null hypothesis. So when the sample mean was 68.9” we were more confident in rejecting the null (smallest significance level of 0.03593) than when the sample mean was 68.8” (smallest significance level of 0.0548). Intuitively also, you should understand that more the deviation between the hypothesized mean (according to the null hypothesis) and the sample mean, greater the evidence against the null. In this example since 68.9 had a greater deviation from 68 than 68.8, we were more confident in rejecting the null at 68.9” than at 68.8”. In fact we couldn’t even reject the null (at a significance level of 0.05) when the sample mean was 68.8”. I really hope you understood the above logic of hypothesis testing. 4 I want to point out one thing in the above examples. In the process of generating the p-values, we first standardized the value of the sample means. For example we got the standardized value as 1.8 in the first example and 1.6 in the second. Using 1.8, we got P(z > 1.8) to be 0.03593 and using 1.6, we got P(z > 1.6) to be 0.0548. We then said that since 0.03593 is less than 0.05 and if the desired significance level is 0.05, we have sufficient evidence to reject the null. In the second example, we said that since 0.0548 is greater than 0.05, which was our desired significance level, we failed to reject the null. Going from the standardized value to the p-value required the extra step of reading the standard normal table. Many textbooks and authors recommend bypassing this extra step of reading the table and making the decision simply based on the standardized value itself. For the above example, for a significance level of 0.05, if the standardized value happens to be greater than or equal to 1.645 then the null hypothesis can be rejected. For a significance level of 0.01, if the standardized value is greater than or equal to 2.33 then the null can be rejected. For a significance level of 0.10, if the standardized value is greater than 1.28, then the null can be rejected. If we carried out the extra step of reading the standard normal table to find the p-value, we would get the exact same decision. Note also that in the above examples the alternate hypothesis was Ha: µ >68”. This alternate hypothesis is considered a one-tailed hypothesis. Why is it considered a one-tailed hypothesis? It’s because the sign in the alternate hypothesis is in one direction only (in this case “>”). An example of a two-tailed alternate hypothesis would be Ha: µ ≠ 68”. In a two-tailed hypothesis, the null would be rejected if either the sample mean is sufficiently higher than 68” or sufficiently lower than 68”. In the one-tailed hypothesis (such as this - Ha: µ >68”), the null would only be rejected if the sample mean was sufficiently higher than 68”. One tailed hypothesis can be in the other direction as well (such as Ha: µ <68”). In this case, the null would only be rejected if the sample mean was sufficiently lower than 68”. Also, in the above example, note that the sample size was 36, which can be considered a large sample size. Any sample size larger than 30 may be considered a large sample. The standardized value of the sample mean, in the above examples is also called the test statistic. It is called the test statistic because it is a statistic (because it summarizes the sample data) and because it is used as the basis for testing the hypothesis. The range of test statistic values above which (or below which) the null hypothesis can be rejected is called the rejection region. Let me now give you, in Figure 2, some rejection regions for various types of hypotheses for large sample sizes: Significance Level Rejection region of the Test Statistic One Tailed (Lower Tail) One Tailed (Upper Tail) Two-Tailed 0.10 z < -1.280 z > 1.280 z < -1.645 or z > 1.645 0.05 z < -1.645 z > 1.645 z < -1.960 or z > 1.960 0.01 z < -2.330 z > 2.330 z < -2.575 or z > 2.575 Figure 2: Rejection Regions for Various Types of Null Hypotheses for Large Sample Sizes You should understand that P(z < -1.280) is 0.10 and P(z > 1.280) is 0.10 and P(z < -1.645) + P(z > 1.645) = 0.10. This is why the rejection regions are what they are in Figure 2. Similarly, P(z < -1.645) = 0.05 and P(z > 1.645) is 0.05 and P(z < -1.96) + P(z > 1.96) = 0.05. Similarly, P(z < -2.33) = 0.01 and P(z > 2.33) = 0.01 and P(z < -2.575) + P(z > 2.575) = 0.01. 5 Hypothesis testing for small Sample In the first part of this chapter we learnt how to perform a test of hypothesis for a large sample (>30). In this part, we will look at how to test a hypothesis when the sample size is small (<30). What is the main difference when dealing with small sample as opposed to large sample? If you recall, for large sample, we specified the rejection region for various significance levels. For example, for a one-tailed test (upper tail), for a significance level of 0.05, the rejection region was z > 1.645. We got 1.645 from the standard normal table. Basically P(z > 1.645) = 0.05. The main difference is that for small samples, we get the cutoff values for the rejection region from the ttables instead of the z-tables. In t-tables, the cutoffs depend on two things – the significance level and the degree of freedom. The degree of freedom for the kinds of hypotheses we have seen so far is (n – 1), where n is the sample size. That is the only difference. In an earlier chapter we learnt how to use the t-tables. Next we will see lots of examples of hypothesis testing. Before going further, let me go over some jargon Critical Value: The critical value is basically the cutoff value above which (or below which) you reject the Null hypothesis. For hypotheses about the mean of a single population, for large sample size we get the critical value from the z-table. For small sample size, we get them from the t-table. For other types of hypotheses, we get the critical value from other types of tables that we have not studied yet, but will study in later chapters. Test Statistic: The summary value that we compute using the sample is called the test statistic. For hypotheses about the mean of a single population, the test statistic is basically the number of standard deviations from the hypothesized value, where the standard deviation is that of the sampling distribution of the mean. p-value: For hypotheses about the mean of a single population, the p-value is the probability of the sample mean being greater than (for a right tailed hypothesis) or less than (for a left tailed hypothesis) its value assuming the null hypothesis is true. Rejection Region: The range of values of test statistic where we reject the null hypothesis. For example for a right-tailed for a significance level of 0.05, the rejection region is “> 1.645” Let me now talk about two new terms: Type-I Error and Type-II Error. Type-I Error: Suppose you get very unlucky in that you get a sample that does not represent the population very well. But you don’t know that you got unlucky. You believe that the sample is good and you are thinking that if the null hypothesis is true, then the probability of getting this sample is very small and you therefore reject the null hypothesis. But the problem was that the sample was bad. At any rate, your decision to reject the null was an error because in fact the null hypothesis was true. This type of error is called a Type-I error. In a Type-I error, you reject the null incorrectly. For example, say a patient does not have cancer and the null hypothesis is also that the patient has no cancer. Suppose the tests suggest that the patient has cancer and the doctor rejects the null hypothesis. This would be an example of a Type-I error. A defendant who is actually innocent but is found guilty is a victim of a Type-I error. Of course, in the last example, the null hypothesis was that the defendant was innocent. 6 Type-II Error: Suppose the null hypothesis is false, but your sample data suggests that it is not false. So when you should have rejected the null, you don’t reject the null. This is a Type-II Error. For example if a patient is declared cancer free when he does in fact have cancer or if a defendant is not found guilty when he is in fact guilty, we have examples of Type-II error. Note that in general either the Type-I or the Type-II error is more expensive than the other. For example making a Type-II error in case of a patient having cancer can cost his life, whereas making a Type-I error on a cancer free patient will at the most force the patient to go through cancer treatment, which although very expensive, will not kill him. So making a Type-II error in this case is more expensive. In the case of a criminal, making a Type-I error is very expensive because you might be executing an innocent man whereas when you make a Type-II error a guilty person wasn’t punished and this person will still be at large, which has a cost to the society but at least no innocent man was killed. You have to evaluate what type of error is more expensive, depending on your situation. Desired Significance Level: The highest probability of Type-I error that you are willing to make is called the desired significance level. For example, if you say you are testing a hypothesis at a desired significance level of 0.05, you are saying that you are willing to tolerate a probability of type-I error up to 5%. The desired significance level is represented by the Greek symbol alpha (α). As the cost of the Type-I error gets higher, your desired significance level should get smaller because the lower your desired significance level, the lower the probability of making a Type-I error. Actual Significance Level: Another name for the p-value. Now I am going to tell you an extremely extremely extremely important thing. So please pay all the attention you can. Ok here we go -- Whenever the test statistic is in the rejection region, the actual significance level (or the p-value) is less than the desired significance level (or α). Steps in Hypothesis Testing 1. State the Null Hypothesis. 2. State the Alternate Hypothesis. 3. Establish the desired significance level (α) and the critical value. Also figure out the rejection region. 4. Collect the data. 5. Calculate the test statistic and the p-value 6. Make your decision using either the critical region approach or the p-value approach 7. State your conclusions (in terms of the original problem). 8. Assess the consequences of making a Type-I error and a Type-II error A few important things about each of the above are in order. Step -1: State the null hypothesis. The null hypothesis usually assumes that all is well, or that there is no difference or that there is no relationship between two variables or that something is equal to something. Usually you will see an equality sign. Here are some examples of valid null hypotheses: H0: µ = 68 H0: µ ≥ 68 H0: µ ≤ 68 H0: µ1 = µ2 H0: µ1 ≤ µ2 H0: µ1 ≥ µ2 7 Following are some examples of invalid null hypotheses: H0: µ > 68 H0: µ < 68 H0: µ ≠ 68 H0: µ1 ≠ µ2 H0: µ1 < µ2 H0: µ1 > µ2 Note that in the examples of invalid null hypotheses, there were no equality signs. In the examples of valid null hypotheses, there were equality signs (≥ and ≤ signs also included the equality signs). Step-2: State the Alternate Hypothesis. The alternate hypothesis assumes not all is well or that there is a difference or that there is relationship between two variables or that something is not equal to something. All the examples of invalid null hypotheses above are actually valid examples of alternate hypotheses. The alternate hypothesis is also called the research hypothesis. Usually a researcher is happy to find evidence in favor of the research hypothesis because then the researcher has something to say because he or she found something out of the ordinary. After all what good is a dinner conversation if everyone is telling everyone straightforward, ordinary, well-known facts? What good is a research report that tells that a treatment had no effect? When you look at the alternate hypothesis, you should be able to tell if it is two-tailed hypothesis or a one-tailed and if it one-tailed, is it left tailed or right tailed. For example if the alternate hypothesis looks like this: Ha: µ ≠ 68, it is a two-tailed hypothesis. If it looks like this Ha: µ > 68 , or this Ha: µ < 68, it is one tailed. The first one is right tailed and the second one is left tailed. Step-3: Establish the desired significance level (α) and the critical value. Also figure out the rejection region. The most popular desired significance level is 0.05 or 5%. But it is just a convenient round number. There is nothing sacred about – it just happens to be the most used significance level. Other lesser popular levels are 0.10 and 0.01. These levels are decided by the researcher and the value depends on the cost of the type-I error. If the cost is high, you want the desired level of significance to be lower – say 0.01. Step-4: Collect the data. This basically means get a good representative sample and measure the values of the random variable of interest. Care must be taken in sampling. The sample should be as random as possible. Step-5: Calculate the test statistic and the p-value. With Excel and computers, this step is easy. There are straightforward formulas for performing this step. Note that the formulas change for different types of hypotheses. We will learn the formulas for some types of hypotheses in this chapter and for other types of hypotheses in later chapters. Step-6: Decision. The decision is basically to either reject the null or fail to reject the null. We never say that we accept the alternate hypothesis or that we accept the null hypothesis. We also never reject the alternate hypothesis. We either reject the null or fail to reject the null. Step 7: State your conclusion in the context of your original problem. Many students confuse conclusion and decision. When I ask for a conclusion, do not say whether the null is rejected or not rejected. Rejecting or not rejecting the null is a decision not a conclusion. The conclusion is always in the context of the real world problem about which the hypotheses are being tested. For example if the hypothesis is that the mean height is 68” (the null) vs. that the mean height is > 68” (the alternate) and if your decision, based on the sample data was to reject the null, then a statement of your conclusion will be something like this - there was sufficient evidence at significance level of α (where α is whatever it is) that the average of height of the population is greater than 68”. If the decision was to fail to reject the null, then a statement of your conclusion will be something like – there wasn’t sufficient evidence at significance level of α that the average height of the population was not 68”. Note that you will not say that there was sufficient evidence that the average height of the population was 68”. 8 Step 8: Assess the consequences of making a type-I error and a type-II error. This is an important exercise and it forces you to think about what if you made a mistake in your decision and hence your conclusion. It is not enough just to declare your decision and your conclusions because there is always a finite probability that your decision and hence your conclusions were incorrect. For example if you made a type-I error in diagnosing a patient with a certain disease then a consequence of that will be that you will give the patient treatment that was never needed which might have a cost and perhaps even some side effects. Examples of Hypothesis Testing We will look at several examples of hypothesis testing now. Example 1: A lot of students prefer to work after their undergraduate degree, but wish to return for graduate education after acquiring some experience in the real world. A researcher is interested in knowing the average age at which students return for graduate studies. The research hypothesis is that the age is greater than 27. To test their hypothesis the researcher interviews 49 graduate students at a campus randomly and finds that the average age of this sample was 28.4 years with and a sample standard deviation of 3.5 years. Using this information carry out all the steps of hypothesis testing at a significance level of 0.05. Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Step 7: Step 8: H0: µ = 27, where µ represents the average age of all graduate students who are returning to graduate school after working for some years after college. Ha: µ > 27 (Note that this is a one-tailed hypothesis. It is right-tailed hypothesis) α = 0.05 (this was given to us). The rejection region is z > 1.645 Data on 49 students was collected. Sample mean xbar = 28.5, sample standard deviation s = 3.5 yrs. The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (28.5 – 27)/(3.5/sqrt(49)) = 1.5/(3.5/7) = 3.0. The p-value corresponding to this test statistic is P(z > 3.0) = 1 – 0.99865 = 0.00135 Decision. Using the critical value approach, since 3.0 > 1.645, we reject the null hypothesis. Using the p-value approach, since 0.00135 is less than 0.05 we reject the null hypothesis. There was sufficient evidence at significance level of 0.05 that the average age of students returning for graduate school after working for some years is greater than 27 years. Since we are rejecting the null hypothesis, there is a chance of making a type-I error, i.e. the true average age may be less than or equal to 27 years although our test indicates otherwise. A consequence might be that the courses may be designed for a more mature audience when in fact the audience is not that mature. This may result in an expectation gap for the faculty teaching the courses. If the age was indeed above 27 and our test showed that it was not, then there would be a type-II error. A consequence would be that the courses would be designed for not so mature audience whereas the students are expecting more advanced courses. Example 2: In the above example, what if the sample average turned out to be 27.5 years instead of 28.5. All other things are the same. Carry out all the steps. Steps 1, 2 and 3 are the same as above. Step 4: Data on 49 students was collected. Sample mean xbar = 27.5, sample standard deviation s = 3.5 yrs. Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (27.5 – 27)/(3.5/sqrt(49)) = 0.5/(3.5/7) = 1.0. The p-value corresponding to this test statistic is P(z > 1.0) = 1 – 0.84134 = 0.15866 9 Step 6: Step 7: Step 8: Decision. Using the critical value approach, since 1.0 < 1.645, we fail to reject the null hypothesis. Using the p-value approach, since 0.15866 is greater than 0.05 we fail to reject the null hypothesis. There was not sufficient evidence at significance level of 0.05 that the average age of students returning for graduate school after working for some years is greater than 27 years. Similar to the example above. Example 3: In example 1, assume that the sample average turned out to be 28 years and the sample standard deviation was 5 years instead of 3.5 years. Steps 1, 2 and 3 are the same as above. Step 4: Data on 49 students was collected. Sample mean xbar = 28, sample standard deviation s = 5.0 yrs. Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (28 – 27)/(5/sqrt(49)) = 1/(5/7) = 7/5 = 1.4. The p-value corresponding to this test statistic is P(z > 1.4) = 1 – 0.91924 = 0.08076 Step 6: Decision. Using the critical value approach, since 1.4 < 1.645, we fail to reject the null hypothesis. Using the p-value approach, since 0.08076 is greater than 0.05 we fail to reject the null hypothesis. Step 7: There was not sufficient evidence at significance level of 0.05 that the average age of students returning for graduate school after working for some years is greater than 27 years. Step 8: Similar to the example above. Example 4: In Example 1, what if the research hypothesis was Ha: µ ≤ 27? Everything else was the same and the sample average came out to be 26.5” and a standard deviation of 3.5 years Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Step 7: Same as Example 1 Ha: µ < 27 (Note that this is a one-tailed hypothesis. It is left-tailed hypothesis) α = 0.05 (this was given to us). The rejection region is z < -1.645 Data on 49 students was collected. Sample mean xbar = 26.5, sample standard deviation s = 3.5 yrs. The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (26.5 – 27)/(3.5/sqrt(49)) = -0.5/(3.5/7) = -1.0. The p-value corresponding to this test statistic is P(z < -1.0) = 0.15866 Decision. Using the critical value approach, since -1.0 > -1.645, we fail to reject the null hypothesis. Using the p-value approach, since 0.15866 is greater than 0.05 we fail to reject the null hypothesis. There was not sufficient evidence at significance level of 0.05 that the average age of students returning for graduate school after working for some years is less than 27 years. Example 5: In Example 1, what if the research hypothesis was Ha: µ ≠ 27? Everything else was the same. Step 1: Step 2: Step 3: Step 4: H0: µ = 27, where µ represents the average age of all graduate students who are returning to graduate school after working for some years after college. Ha: µ ≠ 27 (Note that this is a two-tailed hypothesis.) α = 0.05 (this was given to us). The rejection region is either z < -1.96 or z > 1.96 Data on 49 students was collected. Sample mean xbar = 28.5, sample standard deviation s = 3.5 yrs. 10 Step 5: The test statistic = (xbar – hypothesized µ)/(s/sqrt(n)) = (28.5 – 27)/(3.5/sqrt(49)) = 1.5/(3.5/7) = 3.0. The p-value corresponding to this test statistic is 2*P(z > 3.0) = 2*(1 – 0.99865) = 2*0.00135 = 0.0027. Step 6: Decision. Using the critical value approach, since 3.0 > 1.96, we reject the null hypothesis. Using the p-value approach, since 0.0027 is less than 0.05 we reject the null hypothesis. Step 7: There was sufficient evidence at significance level of 0.05 that the average age of students returning for graduate school after working for some years is not 27 years. Step 8: Since we are rejecting the null hypothesis, there is a chance of making a type-I error, i.e. the true average age may be 27 years although our test indicates otherwise. Let me summarize the above five examples in a table: Example 1 Example 2 Example 3 Example 4 Example 5 H0 µ = 27 µ = 27 µ = 27 µ = 27 µ = 27 Ha µ > 27 µ > 27 µ > 27 µ < 27 µ ≠ 27 Tails One (right) One (right) One (right) One (left) Two α 0.05 0.05 0.05 0.05 0.05 Rejection >1.645 >1.645 >1.645 < -1.645 <-1.96 or >1.96 region Sample 28.5 27.5 28 26.5 28.5 Mean Sample std. 3.5 3.5 5 3.5 3.5 dev. Sample size 49 49 49 49 49 Test (28.5-27)/ (27.5-27)/ (28-27)/ (26.5-27)/ (28.5-27)/ Statistic (3.5/sqrt(49) = (3.5/sqrt(49) = (5/sqrt(49) = (3.5/sqrt(49) = (3.5/sqrt(49) = 3.0 1.0 1.4 -1.0 3.0 p-value P(z>3.0) = P(z>1.0) = P(z>1.4) = P(z<-1.0) = P(z<-3)+P(z>3) = 0.00135 0.15866 0.08076 0.15866 2*P(z>3) = 2*0.00135= 0.0027 Decision Reject Null Fail to reject Fail to reject Fail to reject Reject Now let’s look at an Example involving small sample (<30) Example 6: What if in Example 1, the sample size was 16 instead of 49. Everything else was the same. Example 6 H0 µ = 27 Ha µ > 27 Tails One (right) α 0.05 Rejection region >1.753 Sample Mean 28.5 Sample std. dev. 3.5 Sample size 16 Test Statistic (28.5-27)/(3.5/sqrt(16) = 1.5*4/3.5 = 1.714 p-value P(t>1.714) = 0.0535 Decision Fail to Reject Null Instead of writing paragraphs about it I have shown it in tabular form above similar to the first five examples. I will point out the differences here. In this example, note that the rejection region of >1.753 came from the t-table of Chapter 3, for degrees of freedom 15 under the column 0.05. You can also use Excel function T.INV like this =T.INV(0.95,15) where 0.95 is basically 1 – 0.05 and 15 is the degree of freedom, which in this case is 16 minus 1. 11