Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The t Test In biology you often want to compare two sets of replicated measurements to see if they are the same or different. For example are plants treated with fertilizer taller than those without? If the means of the two sets are very different, then it is easy to decide, but often the means are quite close and it is difficult to judge whether the two sets are the same or are significantly different. The t test compares two sets of data and tells you the probability (P) that the two sets are basically the same. 1.1.5 Deduce the significance of the difference between two sets of data using calculated values for t and the appropriate tables.(3) If you carry out a statistical significance test, such as the t-test, the result is a P value, where P is the probability that there is no difference between the two samples. A. When there is no difference between the two samples: A small difference in the results gives a higher P value, which suggests that there is no true difference between the two samples By convention, if P > 0.05 you can conclude that the result is not significant (the two samples are not significantly different). B. When there is a difference between the two samples: A larger difference in results gives a lower P value, which makes you suspect there is a true difference (assuming you have a good sample size). By convention, if P < 0.05 you say the result is statistically significant. If P < 0.01 you say the result is highly significant and you can be more confident you have found a true effect. As always with statistical conclusions, you could be wrong! It is possible there really is no effect, and you had the bad luck to get sets of results that suggests a difference or not, where there is none. Of course, even if results are statistically highly significant, it does not mean they are necessarily biologically important. Remember this when drawing conclusions. Correlation does not imply causation! Causation and correlation ? 1.1.6 Explain that the existence of a correlation does not establish that there is a causal relationship between two variables.. Typically in Biology your experiment may involve a continuous independent variable and a continuously variable dependent variable. e.g effect of enzyme concentration on the rate of an enzyme catalyzed reaction. The statistical analysis would set out to test the strength of the relationship (correlation). Once a correlation between two factors has been established from experimental data it would be necessary to advance the research to determine what the causal relationship might be. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. Causati It on is important to realize that if the statistical analysis of data indicates a correlation between the independent and dependent variable this does not prove any causation. Only further investigation will reveal the causal effect between the two variables. Correlation does not imply causation! Skirt lengths and stock prices are highly correlated (as stock prices go up, skirt lengths get shorter). The number of cavities in elementary school children and vocabulary size have a strong positive correlation. Clearly there is no real interaction between the factors involved simply a co-incidence of the data. Correlation vs. Causation :We have been discussing correlation. We have looked at situations where there exists a strong positive relationship between our variables x and y. However, just because we see a strong relationship between two variables, this does not imply that a change in one variable causes a change in the other variable. Correlation does not imply causation! Consider the following: In the 1990s, researchers found a strong positive relationship between the number of television sets per person x and the life expectancy y of the citizens in different countries. That is, countries with many TV sets had higher life expectancies. Does this imply causation? By increasing the number of TVs in a country, can we increase the life expectancy of their citizens? Are there any hidden variables that may explain this strong positive correlation? There is a strong positive correlation between ice cream sales and shark attacks. That is, as ice cream sales increase, the number of shark attacks increase. Is it reasonable to conclude the following? Ice cream consumption causes shark attacks. All of the previous examples show a strong positive correlation between the variables. However, in each example it is not the case that one variable causes a change in the other variable. For example, increasing the number of ice cream sales does not increase the number of shark attacks. There are outside factors, also known as lurking variables, which cause the correlation between these variables. Correlation does not always mean that one thing causes the other thing (causation), because a something else might have caused both. For example, on hot days people buy ice cream, and people also go to the beach where some are eaten by sharks. There is a correlation between ice cream sales and shark attacks (they both go up as the temperature goes up in this case). But just because ice cream sales go up does not cause (causation) more shark attacks. Correlation does not imply causation! You may be interested to know that global warming, earthquakes, hurricanes, and other natural disasters are a direct effect of the shrinking numbers of Pirates since the 1800s. For your interest, I have included a graph of the approximate number of pirates versus the average global temperature over the last 200 years. As you can see, there is a statistically significant inverse relationship between pirates and global temperature. What is a t-test? A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. What is a t-test used for? It can be used to determine if two sets of data are significantly different from each other, and is most commonly applied when the test statistic would follow a normal distribution. Why is it called a Student's t-test? The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness brewery in Dublin, Ireland ("Student" was his pen name). Gosset had been hired due to Claude Guinness's policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness's industrial processes. Gosset devised the ttest as a cheap way to monitor the quality of stout. The t-test work was submitted to and accepted in the journal Biometrika, the journal that Karl Pearson had cofounded and was the Editor-in-Chief; the article was published in 1908. Since Guinness had a company policy that chemists were not allowed to publish their findings, the company allowed Gosset to publish his mathematical work but only if he used a pseudonym, that was "Student". Time for Student’s t-test… Statistic’s makes the finest stout! How do you do a t-test? T-test values can be calculated with equations but we will calculate them using EXCEL. Type: 1or 2 Type 1: matched pairs Type 2: unpaired Number of tails: 1 or 2 df: degrees of freedom significance level ( ): usually P= 0.05 t-test to Compare Two Sample Means Student’s t-Test Student’s t-test is the most common (and simple) way of testing to see if there is a significant difference between two independent groups. The t-test statistic is calculated from the means, the number of samples in each group (n1 and n2), and the variance of each group (s1 and s2), according to the following equation. The variance is simply the standard deviation squared. Equation for t value for 2 means t X1 X2 2 1 2 2 s s n1 n2 although you can use equations we will use EXCEL QuickTime™ and a decompressor are needed to see this picture. T-Test using EXCEL (1) Make data table in EXCEL Note: This only gives the P value (2) Add cell P = Insert / Function / TTEST Type 1:paired Type 2:unpaired Means Group A tumor mass (g) 0.72 0.68 0.69 0.66 0.57 0.66 0.7 0.63 0.71 0.73 Group B tumor mass (g) 0.71 0.83 0.89 0.57 0.68 0.74 0.75 0.67 0.8 0.78 0.675 0.742 P = 0.0269 You need to activate: Add ins: Data Analysis Toolpak (1) Label a cell TTEST(2) Click on the adjacent cell (3) Tools | Data Analysis | T-test: Two-Sample Assuming equal variance Group 1 mass (g) 12.5 13 12 12 13 14 13 10.5 9.5 11 Group 2 Mass (g) 12 8.5 10 8 8 13.5 9 8.5 6.5 9 12.05 9.3 TTEST Means t-Test: Two-Sample Assuming Equal Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail Variable 1 12.05 1.858 10 3.0458 0 18 3.5234 0.0012 1.7341 0.0024 2.1009 Variable 2 9.3 4.233 10 Hypothesis Testing Using t-tests What is Hypothesis Testing? Hypothesis testing is used to obtain information about a population parameter. A hypothesis is created about the population parameter, and then a sample from the population is collected and analyzed. The data found will either support or not support the hypothesis. A statistic is any value that is computed from the data in the sample. A test statistic is a statistic that can be used to find evidence in a hypothesis test. If a hypothesis test is conducted to find information about the population mean, the sample mean would be a logical choice of a statistic that would be useful. Steps for Hypothesis Testing Statistical test of difference using the t-Test. There are a few steps for evaluating a dataset or comparing multiple sets of data (statistical inference process). These steps are summarized here: list: 1. State the null hypothesis and the alternative hypothesis based on your research question. Define the hypothesis as to whether your means or standard deviations are significantly different. Null Hypothesis: 'There is no significant difference between the height of shells in sample A and sample B.' H0: μ = μ0 Alternative Hypothesis: 'There is a significant difference between the height of shells in sample A and sample B'. HA: μ ≠ μ0 Hypothesis Testing • The intent of hypothesis testing is formally examine two opposing conjectures (hypotheses), H0 and HA • These two hypotheses are mutually exclusive and exhaustive so that one is true to the exclusion of the other • We accumulate evidence - collect and analyze sample information - for the purpose of determining which of the two hypotheses is true and which of the two hypotheses is false The Null and Alternative Hypothesis The null hypothesis, H0: • States the assumption (numerical) to be tested • Begin with the assumption that the null hypothesis is TRUE • Always contains the ‘=’ sign The alternative hypothesis, Ha: • Is the opposite of the null hypothesis • Challenges the status quo • Never contains just the ‘=’ sign • Is generally the hypothesis that is believed to be true by the researcher Null and Alternative Hypotheses The null hypothesis, denoted H0, is the statement that is being tested. Usually the null hypothesis is the “status quo” or “no change” hypothesis. The hypothesis test looks for evidence against the null hypothesis. The alternative hypothesis, denoted HA or H1, is the statement that we are hoping is true or what we wish to prove. It is the “opposite” of the null hypothesis. Since we wish to prove the alternative hypothesis, we usually write the alternative hypothesis first and then the null hypothesis. Statistical test of difference using the t-Test. 2. Set the critical P level (also called the alpha () level ) usually it will be P = 0.05 (5%) The p-value is the probability of observing an outcome as extreme or more extreme as the observed sample outcome if the null hypothesis is true. decide if the test should be 1- or 2-tailed determine the number of degrees of freedom. df = n1 + n2 - 2 3. Calculate the value of the appropriate statistic. Use the t-test for comparing means Level of Significance Most hypothesis tests fall in the category of significance tests. Before the test is started (before the sample is chosen and anything is computed), a significance level, α is chosen. The most commonly used significance levels are α = 0.10, 0.05, or 0.01. If a significance level isn’t specified, α = 0.05 is the most common choice. The significance level is how much evidence is needed to reject the null hypothesis. For example, if α = 0.05 is chosen, the evidence is considered strong enough to reject the null hypothesis if the data in the sample would only happen 5% of the time, or less, when the null hypothesis is true. That means that the null hypothesis will only be rejected when the data in the sample isn’t very likely if the null hypothesis is true. 4. Write the decision rule for rejecting the null hypothesis. In biology the critical probability is usually taken as 0.05 (or 5%). This may seem very low, but it reflects the facts that biology experiments are expected to produce quite varied results. If P > 5% then the two sets are the same (i.e. accept the null hypothesis). If P < 5% then the two sets are different (i.e. reject the null hypothesis). For the t test to work, the number of repeats should be as large as possible, and certainly > 5. 5. Write a summary statement based on the decision. Example: The null hypothesis is rejected since calculated P = 0.003 < P = 0.05 two-tailed test Depending on whether the calculated value is greater than or less than the tabulated value, you accept or reject your hypothesis, and can thereby conclude whether your data is significantly different or not. 6. Write a statement of results in standard English. There is a significant difference between the height of shells in sample A and sample B. What are degrees of freedom? The “df” in the t-distribution means “degrees of freedom”, in comparing 2 means df = n1 + n2 - 2 The t-distribution is a measure of the area under a curve. The normal distribution QuickTime™ and a decompressor are needed to see this picture. The central region on this graph is the acceptance area and the tail is the rejection region, or regions. In this particular graph of a two-tailed test, the rejection region is shaded blue. The tail is referred to as “alpha“, or p-value (probability value). The area in the tail can be described with z-scores. For example, if the area of the tails was 5% (2.5% each side). x x The t-distribution looks almost identical to the normal distribution curve, only it’s a bit shorter and fatter. The t-distribution can be used for small samples. The larger the sample size, the more the t-distribution looks like the normal distribution. In fact, for sample sizes larger than 20, the t-distribution is almost exactly like the normal distribution. The “df” in the t-distribution means “degrees of freedom” and is just the sample size minus one (n-1). QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. This graph shows what three different t-distributions look like. With a larger sample size (black line, infinite degrees of freedom), the t-distribution looks identical to the normal curve. But with a smaller sample size of four (df = 3), the t-distribution curve is shorter and fatter. x How to Calculate a t-Distribution Step 1: Calculate the df, or degrees of freedom) . Step 2: Look up the df in the left hand side of the DF = 8 t-distribution table. Locate the column under your alpha level (the alpha level is usually given to you in the question. at the 0.05 level t crit = 2.306 In general, statistical tests are used for comparing two means or two standard deviations to see if they are significantly different. You can also compare a mean from measured data to an accepted value to see if your sample measurements match the literature values. There are two main types of t-tests we will use: The usual form of the t test is for "unmatched pairs" (type = 2), where the two sets of data are from different individuals. For example leaves grown in the sun and grown leaves in the shade. QuickTime™ and a decompressor are needed to see this picture. The other form of the t test is for "matched pairs" (type = 1), where the two sets of data are from identical individuals. Pulse Before bpm eating A good example of this is a ” before and after " test. For example the pulse rate of 8 individuals was measured before and after eating a large meal, with the results shown in the left. The mean pulse rate is certainly higher after eating, but is it significantly higher? mean After eating 105 109 79 87 79 86 103 109 87 100 74 82 73 80 83 90 85.4 92.9 Hint: type 1 has 1 group 1. Set up the null and alternative hypothesis Ho there is no difference in the heart rate before and after eating a meal HA the heart rate is higher after eating 2. Set the critical P level (also called the alpha () level ) P 0.05 3. Calculate the value of the appropriate statistic. Which kind of t-test should be used… paired or unpaired? Type 1 Calculate the degrees of freedom (DF) DF # of pairs of data 1 DF = n -1 DF 8 1 7 TTEST t-Test: Paired Two Sample for Means Mean Variance Observations Pearson Correlation Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail Variable 1 85.4 152.6 8 0.9790 0 7 -8.275 0.000 1.895 0.000 2.365 Variable 2 92.9 135.0 8 n-1 Find the critical value. 1.895 Determine if there is a difference or not. t t critical (8.275 > 1.833) So, the null hypothesis is rejected and the alternative hypothesis is accepted Conclusion: Eating a meal increases the heart rate QuickTime™ and a decompressor are needed to see this picture. Hint: type 2 has 2 independent groups QuickTime™ and a decompressor are needed to see this picture. Evaluation of Means for small samples - The t-test The t-test, and any statistical test of this sort, consists of three steps. 1. Define the null and alternate hypotheses, 2. Calculate the t-statistic for the data, 3. Compare tcalc to the tabulated t-value, for the appropriate significance level and degree of freedom. If tcalc > ttab, we reject the null hypothesis and accept the alternate hypothesis. Otherwise, we accept the null hypothesis. The t-test can be used to compare a sample mean to an accepted value (a population mean), or it can be used to compare the means of two sample sets. Rejecting or Failing to Reject the Null Hypothesis Rejecting or Failing to Reject the Null Hypothesis If the p-value is less than the significance level, reject the null hypothesis. For example, if α = 0.05 and the p-value is 0.03, reject the null hypothesis because we expect to see the observed outcome only 3% of the time if the null hypothesis is true. So the observed outcome isn’t very likely. More specifically the probability of the observed outcome happening was less than 5% if the null hypothesis is true. So reject the null hypothesis in favor of the alternative hypothesis and say, “there is sufficient evidence to reject the null hypothesis”. To summarize with non technical language, if something is not very likely, reject it. Rejecting or Failing to Reject the Null Hypothesis If the p-value is greater than the significance level, fail to reject the null hypothesis. For example, if α = 0.05 and the p-value is 0.15, fail to reject the null hypothesis. The observed outcome is expected 15% of the time if the null hypothesis is true. This may not seem very likely, but it is more likely than 5% so the conclusion is to fail to reject the null hypothesis, and we say, “there is not sufficient evidence to reject the null hypothesis.” Rejecting or Failing to Reject the Null Hypothesis Why shouldn’t the conclusion be, “there is sufficient evidence to accept the null hypothesis”? It is a convention based on the fact that in mathematics, statements are not proved with examples. A claim can be disproven with one example, but even one million examples in favor of the claim can’t prove it. To borrow a common phrase, “Absence of evidence is not evidence of absence”. However, hypothesis tests don’t actually prove anything anyways. They are just a method of judging the evidence for or against a hypothesis. Yet, the tradition is strong enough, that a conclusion should never be, “there is sufficient evidence to accept the null hypothesis”. Analogy Until the 17th century Europeans thought every swan was white because for centuries, every swan they saw was white. Then, a black swan was discovered in Australia, instantly disproving the hypothesis that all swans are white. Analogy Suppose a person thinks that there might have been a skunk in his yard the previous night. A null hypothesis is that there was no skunk in the yard (status quo). The alternative hypothesis would then be that there was a skunk in the yard. H0: There was no skunk in the yard. HA: There was a skunk in the yard. He could go outside the next day and look for evidence that there was a skunk. If he finds skunk fur or smells a skunk, then he would have evidence to reject the null hypothesis in favor of the alternative hypothesis (that there was a skunk). On the other hand, if he doesn’t find evidence that a skunk was there, that does not mean that the null hypothesis is true. A skunk could have been there without leaving evidence. That is why he shouldn’t say he accepts the null hypothesis. He doesn’t know for sure that there wasn’t a skunk. He just doesn’t have evidence to support the claim that there was a skunk. So he says there is not sufficient evidence to reject the null hypothesis or that he fails to reject the null hypothesis. If he rejects the null hypothesis, he could technically say that he accepts the alternative hypothesis. However, tradition dictates that conclusions are always stated as rejecting or failing to reject hypothesis rather than accepting hypothesis. If he finds skunk fur in the yard, he would reject the null hypothesis. Yet he still hasn’t proved that there was a skunk. The dog could have brought the fur into the yard. Because he hasn’t proved the alternative hypothesis, (he might have strong evidence that there was a skunk, but he hasn’t proven it) he shouldn’t say that he accepts the alternative hypothesis. Pvalues • Calculate a test statistic in the sample data that is relevant to the hypothesis being tested • After calculating a test statistic we convert this to a P value by comparing its value to distribution of test statistic’s under the null hypothesis • Measure of how likely the test statistic value is under the null hypothesis P-value ≤ α ⇒ Reject H0 at level α P-value > α ⇒ Do not reject H0 at level α 1- vs 2-Tailed Tests QuickTime™ and a decompressor are needed to see this picture. EXP: This drug makes tumors smaller QuickTime™ and a decompressor are needed to see this picture. Exp: This drug makes rats grow bigger QuickTime™ and a decompressor are needed to see this picture. This drug changes blood pressure QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. Type 1 Type 2 a Hint: type 1QuickTime™ has 1 andgroup decompressor DF n 1 are needed to see this picture. DF n1 n2 2 Hint: type 2 has 2 independent groups Example #1: In an investigation to determine the effectiveness of sequencing fingerprints 10 prints are taken enhanced with DFO and then with ninhydrin. The points of detail (minutiae) are recorded. Is there a difference at the 95% confidence level? DFO DFO +Ninhydrin 8 10 12 15 11 12 6 6 9 13 11 14 7 9 8 9 10 15 9 12 In biometrics and forensic science, minutiae are major features of a fingerprint, using which comparisons of one print with another can be made. t-test for matched pairs 1. Set up the null and alternative hypothesis Ho there is no difference in the number of minutae when using ninhydrin HA there are more number of minutae after enhancment with ninhydrin Is this 1 or 2 tailed? In biometrics and forensic science, minutiae are major features of a fingerprint, using which comparisons of one print with another can be made. 1 tailed 2. Set the critical P level (also called the alpha () level ) 95% confidence level? P 0.05 3. Calculate the value of the appropriate statistic. Which kind of t-test should be used… paired or unpaired? Type 1 Calculate the degrees of freedom (DF) DF # of pairs of data 1 DF = n -1 DF 10 1 9 (1) Label a cell TTEST (2) Click on the adjacent cell (3) Tools | Data Analysis | T-test: Two paired two sample means DFO 8 12 11 6 9 11 7 8 10 9 9.1 DFO & Ninhydrin 10 15 12 6 13 14 9 9 15 12 11.5 AVERAGE t-Test: Paired Two Sample for Means Variable 1 Variable 2 Mean 9.1 11.5 Variance 3.6555556 8.7222222 Observations 10 10 Pearson Correlation 0.8953207 Hypothesized Mean Difference 0 df 9 t Stat -5.0410083 use absolute value! P(T<=t) one-tail 0.0003494 t 5.04 t Critical one-tail 1.8331129 P(T<=t) two-tail 0.0006988 t Critical two-tail 2.2621572 Find the critical value. 1.833 Determine if there is a difference or not. t t critical (5.0 > 1.833) So, the null hypothesis is rejected and the alternative hypothesis is accepted Conclusion:The ninhydrin does make a positive difference. You can also use EXCEL to solve directly for the P value (1) Make data table in EXCEL Type 1:paired (2) Add cell P = Type 2:unpaired Insert / Function / TTEST DFO 8 12 11 6 9 11 7 8 10 9 9.1 DFO & Ninhydrin 10 15 12 6 13 14 9 9 15 12 11.5 The mean of DFO only is significantly less than the mean DFO + Ninhydrin because the value of P < 0.05. P= 0.0003494 The null hypothesis is rejected, Conclusion: there are more number of minutae after enhancment with ninhydrin AVERAGE t-test for unmatched (independent) pairs If there is no before and after relationship between the sample then the independent samples test is used. t X1 X2 2 1 2 2 s s n1 n2 t-test for unmatched (independent) pairs Example 2: Some brown dog hairs were found on the clothing of a victim at a crime scene involving a dog. The diameters of the five hairs were measured: 46, 57, 54, 51, 38 m A suspect is the owner of the dog with similar brown hairs. A sample of the hairs has been taken and their widths measured: 31, 35, 50, 35, 36 m Is it possible that the hairs found on the victim were left by the suspect’s dog? Test at the 5% level. t-test for unmatched (independent) pairs 1. Set up the null and alternative hypothesis Null Hypothesis (Ho): 'There is no significant difference between the hairs from Dog A and Dog B.' Alternative Hypothesis (HA): 'There is a significant difference between the hairs from Dog A and Dog B.' Is this 1 or 2 tailed? Is this a Type 1 or Type study? 2 tailed Type 2… 2 independent groups t-test for unmatched (independent) pairs 2. Calculate mean and standard deviation of the for the data sets. DOG A hair (m) DOG B hair (m) 46 31 35 57 54 38 50 35 36 246 187 Total 49.2 37.4 Mean 7.463 7.301 STD DEV 51 3. Calculate the magnitude of the difference between the two means. t X1 X2 s12 s22 n1 n2 X1 X2 49.2 37.4 11.8 11.8 4. Calculate the standard error in the difference. 2 1 2 2 s s n1 n2 s1 s2 n1 n2 7.4632 7.3012 5 5 4.669 18.56 10.66 4.669 4.67 3. Calculate the magnitude of the difference between the two means. 11.8 4. Calculate the standard error in the difference. 4.669 5. Calculate the value of t. t X1 X2 2 1 2 2 s s n1 n2 11.8 t 2.527 2.53 4.669 6. Calculate the degrees of freedom (DF). DF n1 n2 2 5 5 2 8 t-test for unmatched (independent) pairs 7. Choose your level of significance and find the critical value using the table. 2.306 at the 0.05 level t crit = 2.306 8. Determine if there is a difference or not. If t < critical value then there is no significant difference between the two data sets If t > critical value then there is a significant difference between the two data sets t > t critical (2.53 > 2.306) So, at 0.05 level there is a significant difference between the two data sets… we reject the null hypothesis… that the hairs came from the same dog Conclusion: The hairs are from different dogs (HA) I told you I wuz framed! Example #3: A researcher wishes to test if a certain kind of growth hormone will produce faster growth in mice. She injects 10 mice with Group 1 Group 2 – the hormone and uses another Hormone No Hormone mass (g) 10 as a control. mass( g) Three weeks later, she weighs the mice and discovers that the mean weight of mice that have received the injections is 12.05 g and the mean weight of control mice is 9.3 g. 12.5 12 13 8.5 12 10 12 8 13 8 14 13.5 13 9 10.5 8.5 9.5 6.5 11 9 Mean = 12.05 Mean = 9.3 These values indicate that the mice receiving the hormone are heavier. Is her value of 12.05 significantly different than 9.3? Is it possible that the hormone has no effect, that the weight difference between the two groups is due to chance? This is like flipping a coin 10 times. You expect 5 heads and 5 tails but you might get 6 heads or 7 heads or perhaps 8 heads. Similarly, if the hormone does not work, you expect the mean for the two groups to be similar but it may not be exactly the same. Group 1 Hormone mass( g) Group 2 – No Hormone mass (g) 12.5 12 13 8.5 12 10 12 8 13 8 14 13.5 13 9 10.5 8.5 9.5 6.5 11 9 Mean = 12.05 Mean = 9.3 Is this a Type 1 or Type 2 study? Two independent groups… Type 2 How many degrees of freedom are there? DF n1 n2 2 DF 10 10 2 18 What is the chance that the two means would be as different as 12.05g and 9.3g if the hormone really did not work? Statistical tests test whether differences in the data are real differences or whether they are due to chance. In the example above, we test if the mean of group 1 is significantly different than the mean of group 2. The alternative is that the difference is due to chance or random fluctuations and the hormone did not cause additional weight gain. The test gives the probability that difference could be due to chance. If the probability P) that the difference is due to chance is less than 1 out of 20 (P<0.05), then we conclude that the difference is real. If the probability is greater than 0.05, we conclude that the difference is not significant, it could be due to chance. There are several tests available for testing means. A commonly used test for data that are normally distributed is the t-test. Null hypothesis (Ho): There is no difference in growth Sara's Hypothesis is that newborn mice injected with the hormone will be heavier after 3 weeks of growth than mice without the hormone (HA). Is this a one or two tailed test? One tailed Number of Tails The number of tails in a test refers to the number of ways that the two groups can differ. The following hypothesis would lead us to perform a two-tailed test: The mean weight of mice injected with the hormone will be different than the mean weight of the control mice. This is two-tailed because the hypothesis proposes two possible outcomes. The hypothesis is true if the weight hormone mice is greater than the weight of control mice. The hypothesis is also true if the weight of hormone mice is less than the weight of control mice. The following hypothesis would lead us to perform a one- tailed test: The mean weight of mice injected with the hormone will be greater than the mean weight of the control mice. The following hypothesis would also lead us to perform a one-tailed test. The mean weight of mice injected with the hormone will be less than the mean weight of the control mice. This is a one-tailed test because the hypothesis proposes that there is only one possible outcome: the weight of the hormone mice will be less than the weight of the control mice. The calculations for the test can be performed by hand but computer software can do them very quickly. To perform the test, the weight data for the two groups of mice above are entered into a t-test program. Using P = 0.05 what is the critical value? 1.734 (1) Label a cell TTEST(2) Click on the adjacent cell (3) Tools | Data Analysis | T-test: Two-Sample Assuming equal variance Group 1 mass (g) Group 2 mass (g) with Hormone 12.5 13 12 12 13 14 13 10.5 9.5 11 Mean = 12.05 Control 12 8.5 10 8 8 13.5 9 8.5 6.5 9 Mean = 9.3 TTEST Group 1 mass (g) Group 2 mass (g) with Hormone Control TTEST 12.5 12 t-Test: Two-Sample Assuming Equal Variances 13 8.5 12 10 Variable 1 Variable 2 12 8 Mean 12.05 9.3 13 8 Variance 1.858 4.233 14 13.5 Observations 10 10 13 9 Pooled Variance 3.0458 10.5 8.5 Hypothesized Mean Difference 0 9.5 6.5 df 18 11 9 t Stat 3.523 Mean = 12.05 Mean = 9.3 P(T<=t) one-tail 0.001214 t Critical one-tail 1.734 P(T<=t) two-tail 0.002427 t Critical two-tail 2.101 est: Two-Sample Assuming Equal Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail Variable 1 12.05 1.858 10 3.046 0 18 3.523 0.00121 1.734 0.00243 2.101 Variable 2 9.3 4.233 10 Determine if there is a difference or not. If t < critical value then there is no significant difference between the two data sets If t > critical value then there is a significant difference between the two data sets t > t critical (3.523 > 1.734) So, at 0.05 level there is a significant difference between the two data sets… we reject the null hypothesis… that the mice had the same mass gain. Alternate hypothesis: The mice that received the hormone injection gained more weight. You could also come to this conclusion using the P value INSERT \ FUNCTION \ TTEST Group 1 mass (g) 12.5 13 12 12 13 14 13 10.5 9.5 11 Group 2 Mass (g) 12 8.5 10 8 8 13.5 9 8.5 6.5 9 12.05 9.3 P= 0.0012 Means The software reveals that p = 0.0012. The probability that the difference between the two means (12.05 and 9.3) is due to chance (random effects) is 0.0012 (or 12 out of 10,000). Because p < 0.05 ( 5 out of 100), we conclude that the two means are really different and that the difference is not due to chance. Conclusion: The researcher accepts her hypothesis that the hormone produces faster growth. If p had been greater than 0.05, we would reject her hypothesis (accept Ho) and conclude that the two means are not significantly different ; the hormone did not cause one group to be heavier. The word "significant" has a slightly different meaning in statistics than it does in general usage. In a statistical test of two means, if the difference is not due to chance, we conclude that the two means are significantly different. In this example, the mean weight of group 1 is significantly heavier than the mean weight of group 2. More Examples of Statistical Analysis Using a t-test Example #4: A researcher wishes to learn if a certain drug slows the growth of tumors. She obtained mice with tumors and randomly divided them into two groups. She then injected one group of mice with the drug and used the second group as a control. After 2 weeks, she sacrificed the mice and weighed the tumors. The weight of tumors for each group of mice is below. The researcher is interested in learning if the drug reduces the growth of tumors. Her hypothesis is: The mean weight of tumors from mice in group A will be less than the mean weight of mice in group 2. Mean = Group A Treated with Drug Group B Control- Not Treated 0.72 0.71 0.68 0.83 0.69 0.89 0.66 0.57 0.57 0.68 0.66 0.74 0.70 0.75 0.63 0.67 0.71 0.80 0.73 0.78 0.675 0.742 A t-test can be used to test the probability that the two means do not differ. 1. Set up the null and alternative hypothesis Null hypothesis Ho: the two means of the tumor masses do not differ. Alternative hypothesis HA: The tumors from the group treated with the drug will weigh less than tumors from the control group. Is this a one or two tailed test? This is a one-tailed test because the researcher is interested in if the drug decreased tumor size. She is not interested in if the drug changed tumor size. The values from the table above are entered into the spreadsheet as shown below. You need to activate: Add ins: Data Analysis Toolpak (1) Label a cell TTEST(2) Click on the adjacent cell Group A with drug tumor mass (g) 0.72 0.68 0.69 0.66 0.57 0.66 0.7 0.63 0.71 0.73 0.675 TTEST Group B control tumor mass (g) 0.71 0.83 0.89 0.57 0.68 0.74 0.75 0.67 0.8 0.78 0.742 (3) Tools | Data Analysis | T-test: Two-Sample Assuming equal variance MEAN (1) Label a cell TTEST(2) Click on the adjacent cell Group A with drug tumor mass (g) 0.72 0.68 0.69 0.66 0.57 0.66 0.7 0.63 0.71 0.73 0.675 TTEST Group B control tumor mass (g) 0.71 0.83 0.89 0.57 0.68 0.74 0.75 0.67 0.8 0.78 0.742 (3) Tools | Data Analysis | T-test: Two-Sample Assuming equal variance MEAN t-Test: Two-Sample Assuming Equal Variances Variable 1 Mean 0.675 Variance 0.0022944 Observations 10 Pooled Variance 0.0052672 Hypothesized Mean Difference 0 df 18 t Stat -2.0642818 P(T<=t) one-tail 0.0268544 t Critical one-tail 1.7340636 P(T<=t) two-tail 0.0537089 t Critical two-tail 2.100922 Variable 2 0.742 0.00824 10 Determine if there is a difference or not. Using P = 0.05 what is the critical value? 1.734 Determine if there is a difference or not. If t < critical value then there is no significant difference between the two data sets If t > critical value then there is a significant difference between the two data sets t > t critical (2.06 > 1.734) Decision: So, at 0.05 level there is a significant difference between the two data sets… we reject the null hypothesis… that both groups would have the same mass tumors. Conclusion: The drug reduced the size of the tumors. t-Test: Two-Sample Assuming Equal Variances Variable 1 Mean 0.675 Variance 0.0022944 Observations 10 Pooled Variance 0.0052672 Hypothesized Mean Difference 0 df 18 t Stat -2.0642818 P(T<=t) one-tail 0.0268544 t Critical one-tail 1.7340636 P(T<=t) two-tail 0.0537089 t Critical two-tail 2.100922 Variable 2 0.742 0.00824 10 Decision: The t-test shows that tumors from the drug group were significantly smaller than the tumors from the control group because p < 0.05. Reject the null hypothesis! Conclusion: The researcher therefore accepts her hypothesis (HA) that the drug reduces the growth of tumors. T-Test using EXCEL Note: This only gives the P value (1) Make data table in EXCEL (2) Add cell P = Insert / Function / TTEST Group A with drug tumor mass (g) 0.72 0.68 0.69 0.66 0.57 0.66 0.7 0.63 0.71 0.73 0.675 P = Type 1:paired Type 2:unpaired Group B control tumor mass (g) 0.71 0.83 0.89 0.57 0.68 0.74 0.75 0.67 0.8 0.78 0.742 0.0269 MEAN Example #5: A researcher wishes to learn whether the pH of soil affects seed germination of a particular herb found in forests near her home. She filled 10 flower pots with acid soil (pH 5.5) and ten flower pots with neutral soil (pH 7.0) and planted 100 seeds in each pot. The mean number of seeds that germinated in each type of soil is given on the table… Table 1: Mean # of germanium seeds germinated at different pH Acid Soil pH 5.5 Mean = Neutral Soil pH 7.0 % germination % germination 42 43 45 51 40 56 37 40 41 32 41 54 48 51 50 55 45 50 46 48 43.5 48 The researcher is testing whether soil pH affects germination of the herb. A t-test can be used to test the probability that the two means do not differ. Her hypothesis is: The mean germination at pH 5.5 is different than the mean germination at pH 7.0. What is the null hypothesis? Null hypothesis (Ho) : The mean germination at pH 5.5 is the same as the mean germination at pH 7.0. Alternative hypothesis (HA) : The mean germination at pH 5.5 is different than the mean germination at pH 7.0. Null hypothesis (Ho) : The mean germination at pH 5.5 is the same than the mean germination at pH 7.0. Alternative hypothesis (HA) : The mean germination at pH 5.5 is different than the mean germination at pH 7.0. Is this a one or two tailed test? This is a two-tailed test because the researcher is interested in if soil acidity changes germination percentage. She does not specify if it increases or decreases germination. Table 1: Mean # of germanium seeds germinated at different pH Acid Soil pH 5.5 Mean = Neutral Soil pH 7.0 % germination % germination 42 43 45 51 40 56 37 40 41 32 41 54 48 51 50 55 45 50 46 48 43.5 48 Is this a Type 1 or Type 2 study? Two independent groups… Type 2 How many degrees of freedom are there? DF n1 n2 2 DF 10 10 2 18 Choose your level of significance and find the critical value using the table. at the 0.05 level t crit = 2.101 2.101 Calculate the value of t (use EXCEL). Tools | Data Analysis | Mean = Acid Soil pH 5.5 Neutral soil pH 7.0 % germination % germination 42 45 40 37 41 41 48 50 45 46 43.5 43 51 56 40 32 54 51 55 50 48 48 TTEST T-test: Two-Sample Assuming equal variance t-Test: Two-Sample Assuming Equal Variances Variable 1 Mean 43.5 Variance 15.83333333 Observations 10 Pooled Variance 36.58333333 Hypothesized Mean Difference 0 df 18 t Stat -1.66362669 P(T<=t) one-tail 0.056749718 t Critical one-tail 1.734063592 P(T<=t) two-tail 0.113499436 t Critical two-tail 2.100922037 Variable 2 48 57.333333 10 Use the absolute value ! t 1.66 Determine if there is a difference or not. If t < critical value then there is no significant difference between the two data sets If t > critical value then there is a significant difference between the two data sets t < t critical (1.66 < 2.101) Decision: So, at 0.05 level there is not a significant difference between the two data sets… we accept the null hypothesis… that both groups would have the same % germination. Conclusion: The mean germination at pH 5.5 is not different than the mean germination at pH 7.0. t-Test: Two-Sample Assuming Equal Variances Variable 1 Mean 43.5 Variance 15.83333333 Observations 10 Pooled Variance 36.58333333 Hypothesized Mean Difference 0 df 18 t Stat -1.66362669 P(T<=t) one-tail 0.056749718 t Critical one-tail 1.734063592 P(T<=t) two-tail 0.113499436 t Critical two-tail 2.100922037 Variable 2 48 57.333333 10 The t-test shows that the mean germination of the two groups does not differ significantly because p > 0.05. The researcher concludes that pH does not affect germination of the herb. Example #6: Suppose a researcher wished to learn if a particular chemical is toxic to a certain species of beetle. She believes the chemical might interfere with the beetle’s reproduction. She obtained beetles and divided them into two groups. She then fed one group of beetles with the chemical and used the second group as a control. After 2 weeks, she counted the number of eggs produced by each beetle in each group. The mean egg count for each group of beetles is below. Group 1 fed chemical Mean = 33 31 34 38 32 28 32.7 Group 2 not fed chemical (control) 35 42 43 41 40.3 1. Set up the null and alternative hypothesis The researcher believes the chemical interferes with beetle reproduction. She suspects that the chemical reduces egg production. Her hypothesis is: The mean number of eggs in group 1 less than the mean number of group 2. A t-test can be used to test the probability that the two means do not differ (null hypothesis). Ho = The mean number of eggs is the same for both groups. HA = The mean number of eggs is less for group 1 than group 2. is Is this a one or two tailed test? This is a 1-tailed test because her hypothesis proposes that group B will have greater reproduction than group 1. If she had proposed that the two groups would have different reproduction but was not sure which group would be greater, then it would be a 2-tailed test. Group 1 fed chemical Group 2 not fed chemical (control) 33 35 31 42 34 43 38 41 Calculate the degrees of freedom 40.3 DF n1 n2 2 DF 6 4 2 8 32 28 Mean 32.7 Is this a Type 1 or Type 2 study? Type 2 … there are 2 independent groups Note: group populations can be different sizes Choose your level of significance and find the critical value using the table. 1.86 at the 0.05 level t crit = 1.86 Calculate the value of t (use EXCEL). Tools | Data Analysis | T-test: Two-Sample Assuming equal variance t-Test: Two-Sample Assuming Equal Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail Variable 1 32.67 11.07 6 11.76 0 8 -3.426 0.005 1.860 0.009 2.306 Variable 2 40.25 12.92 4 Determine if there is a difference or not. If t < critical value then there is no significant difference between the two data sets If t > critical value then there is a significant difference between the two data sets t > t critical (3.43 > 1.86) So, at 0.05 level there is a significant difference between the two data sets… we reject the null hypothesis… that the both beetle groups had the same # of eggs HA: The mean number of eggs is less for group 1 than group 2. Conclusion: The chemical reduces the number of beetle eggs produced. You can also determine the answer by finding P value Group 1 33 31 34 38 32 28 Group 2 35 42 43 41 P= P= 0.0045 0.00450564 Insert / Function / TTEST The results of her t-test are copied below. t-Test: Two-Sample Assuming Equal Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail Variable 1 32.67 11.07 6 11.76 0 8 -3.426 0.005 1.860 0.009 2.306 Variable 2 40.25 12.92 4 The researcher concludes that the mean of group 1 is significantly less than the mean for group 2 because the value of P < 0.05. She accepts her hypothesis that the chemical reduces egg production because group 1 had significantly less eggs than the control. Hypothesis Testing Errors Hypothesis Testing Errors Type I Error: A type I error happens when a true null hypothesis is rejected. The probability of a type I error is denoted α. Type II Error: A type II error occurs when a false null hypothesis is not rejected. The probability of a type II error is denoted β. Decision H0 is true H0 is false Reject H0 Type I Error Correct Decision Fail to Reject H0 Correct Decision Type II Error The goal is to make both α and β as small as possible. Unfortunately for a fixed sample size, decreasing α will increase β and vice versa. It is necessary to choose which error is more important to decrease based on the scenario. In most cases, β is difficult to calculate so α is set between 0.01 and 0.10. The probability of a type I error, α, is also the level of significance. If α = 0.05 is chosen, then a null hypothesis is rejected if the sample would only happen 5% of the time if the null hypothesis is true. That means that there is a probability of 0.05 that the null hypothesis will be rejected when it is true. Analogy Trials are like hypothesis tests. Since a person is innocent until proven guilty, innocence is the status quo. H0= the plaintiff is innocent HA= the plaintiff is guilty If a jury convicts an innocent man, the jury has made a type I error. If the jury comes to the conclusion that a man is innocent, but he was actually guilty, they have made a type II error. In this case, a decision has to be made about whether it is better to minimize the probability of a type I error or a type II error. Is it better to send an innocent man to jail or to release a guilty man? In criminal trials the precedent is that a man is only convicted if the evidence is beyond a reasonable doubt. We don’t want to convict an innocent man, so the courts try to minimize the probability of a type I error. Of course this means that more guilty people are not convicted so the probability of a type II error is higher. d Species area curve. Figure EMC.sp-12. Number of wildlife species or groups using different tree/snag species in Eastside Mixed Conifer Forest Wildlife Habitat Type. Data Evaluation and Comparisons http://www.chem.utoronto.ca/coursenotes/analsci/StatsTut orial/AdvStats.html QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. T-Test Calculation : Excel 2007 (calculating P) In Excel 2007 the TTEST to calculate P is accessed by following the routine provided to the left. QuickTime™ and a decompressor are needed to see this picture. Note that his directly calculates P and not t STAT After step 5 a dialog box opens (see below). T-Test Calculation : Excel 2007 (calculating P) QuickTime™ and a decompressor are needed to see this picture. Enter the setting as provided: In Excel 2003 the t test is performed using the formula: = TTEST (range1, range2, tails, type) . For the examples you'll use in biology, tails is always 2 , and type can be: 1, paired 2,Two sample equal variance 3, Two samples unequal variance QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. The cell with the t test P can be formatted as a percentage (Format menu > cell > number tab > percentage). This automatically multiplies the value by 100 and adds the % sign. This can make P values easier to read and understand. It's also a good idea to plot the means as a bar chart with error bars of standard error (or SD) to show the variability in the data. Sample Excel Data t-Test: Two-Sample Assuming Unequal Variances Variable 1 Mean 31.4 Variance 32.0 Observations 10 Hypothesized Mean Difference 0 df 17 t Stat -4.54 P(T<=t) one-tail 0.00 t Critical one-tail 1.74 P(T<=t) two-tail 0.00 t Critical two-tail 2.11 Variable 2 41.6 18.5 10 QuickTime™ and a decompressor are needed to see this picture. t-test to Compare One Sample Mean to an Accepted Value We have already seen how to do the first step, and have null and alternate hypotheses. The second step involves the calculation of the t-statistic for one mean, using the formula: x o t s n where s is the standard deviation of the sample, not the population standard deviation. In our case, t Test (non-matched pairs) QuickTime™ and a decompressor are needed to see this picture. Quic kTime™ and a dec ompres sor are needed to see this pic ture. where n is the number of points X A is the mean of set A sA is the standard deviation of set A X B is the mean of set B SB is the standard deviation of set B The probability is then found from a table of t values. t Test (matched pairs) Quick Time™a nd a dec ompr esso r ar e nee ded to see this pictur e. Quic kTime™ and a dec ompres sor are needed to see this pic ture. where n is the number of points X is the mean of the differences and s is the standard deviation of the differences t-test to Compare Two Sample Means In this case, we require two separate sample means, standard deviations and sample sizes. The number of degrees of freedom is computed using the formula s s n n 1 2 2 1 d.o. f . 2 2 2 4 4 s1 s2 n 2(n 1) n 2(n 1) 1 1 2 1 2 and the result is rounded to the nearest whole number. Once these quantities are determined, the same three steps for determining the validity of a hypothesis are used for two sample means. Table of Critical t-Values for 95% Confidence Level ν=n-1 1 2 3 4 5 6 7 8 9 10 12 14 15 16 18 20 25 30 tcrit 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.179 2.145 2.131 2.120 2.101 2.086 2.060 2.042 Table of critical values for a 2-tailed t-test at 95% confidence level, generated from Excel using the TINV function. t-test to Compare One Sample Mean to an Accepted Value In the example, the mean of arsenic concentration measurements was m = 4 ppm, for n = 7 and, with sample standard deviation s = 0.9 ppm. We established suitable null and alternative hypothoseses: Null Hypothesis H0: μ = μ0 Alternate Hypothesis HA: μ > μ0 where μ0 = 2 ppm is the allowable limit and μ is the population mean of the measured soil (refresher on the difference between sample and population means). t-test to Compare One Sample Mean to an Accepted Value For the third step, we need a table of tabulated t-values for significance level and degrees of freedom, such as the one found in your lab manual or most statistics textbooks. Referring to a table for a 95% confidence limit for a 1-tailed test, we find tν = 6.95% = 1.94. (The difference between 1- and 2-tailed distributions was covered in a previous section.) t-test to Compare One Sample Mean to an Accepted Value We are now ready to accept or reject the null hypothesis. If the tcalc > ttab, we reject the null hypothesis. In our case, tcalc = 5.88 > ttab = 2.45, so we reject the null hypothesis, and say that our sample mean is indeed larger than the accepted limit, and not due to random chance, so we can say that the soil is indeed contaminated. One and Two Sided Tests • Hypothesis tests can be one or two sided (tailed) • One tailed tests are directional: H0: μ1 - μ2 ≤ 0 HA: μ1 - μ2 > 0 • Two tailed tests are not directional: H0: μ1 - μ2 = 0 HA: μ1 - μ2 ≠ 0 Some Notation • In general, critical values for an α level test denoted as: One sided test : Xa Two sided test : Xa/2 where X depends on the distribution of the test statistic • For example, if X ~ N(0,1): One sided test : za (i.e., z0.05 = 1.64) Two sided test : za/2 (i.e., z0.05 / 2 = z 0.025 = ± 1.96) x x 1- vs 2-Tailed Tests One-tailed tests In the previous pages, you learned how to perform define the hypothesis for a statistical test, then to perform a t-test to compare means. In the example t-test we performed, we defined an alternate hypothesis to test whether one mean was greater than the other: μ > μ0. In this situation, we tested whether one mean was higher than the other. We were not interested in whether the first mean was lower than the other, only if it was higher. So we were only interested in one side of the probability distribution, which is shown in the image below: One-tailed tests In the previous pages, you learned how to perform define the hypothesis for a statistical test, then to perform a t-test to compare means. In the example t-test we performed, we defined an alternate hypothesis to test whether one mean was greater than the other: μ > μ0. In this situation, we tested whether one mean was higher than the other. We were not interested in whether the first mean was lower than the other, only if it was higher. So we were only interested in one side of the probability distribution, which is shown in the image below: One-tailed tests In this distribution, the shaded region shows the area represented by the null hypothesis, H0: μ = μ0. This actually implies μ ≤ μ0, since the only unshaded region in the image shows μ > μ0. Because we were only interested in one side of the distribution, or one "tail", this type of test is called a one-sided or a one-tailed test. When you are using tables for probability distributions, you should make sure whether they are for one-tailed or two-tailed tests. Depending on which they are for, you need to know how to switch to the one you need. One-tailed tests A one-tailed test uses an alternate hypothesis that states either H1: μ > μ0 OR H1: μ < μ0, but not both. If you want to test both, using the alternate hypothesis H1: μ ≠ μ0, then you need to use a two-tailed test. Two-tailed tests We would use a two-tailed test to see if two means are different from each other (ie from different populations), or from the same population. As an example, let's assume that we want to check if the pH of a stream has changed significantly in the past year. A water sample from the stream was analyzed using a pH electrode, where six samples were taken. It was found that the mean pH reading was 6.5 with standard deviation sold = 0.2. A year later, six more samples were analyzed, and the mean pH of these readings was 6.8 with standard deviation sold = 0.1. Example 1 We could use a one-tailed test, to see if the stream has a higher pH than one year ago, for which we would use the alternate hypothesis HA: μprev < μcurrent. However, we may want a more rigorous test, for the hypothesis that HA: μprev ≠ μcurrent. This would mean that both HA: μprev < μcurrent and HA: μprev > μcurrent were satisfied, and we could be sure that there is a significant difference between the means. The probability distribution for a 90% confidence level, two-tailed test looks like this: Continuing the example, we define the null hypothesis Ho: μprev = μcurrent, and the alternate hypothesis HA: μprev ≠ μcurrent. The d.o.f. for a two sample mean ttest is ν = 7.35 ≈ 7, since the d.o.f. must be a whole number. The t-value for the two sample test is t = (6.8-6.5)/sqrt((0.1)^2/6 + (0.2)^2/6) = 3.29 If we consult a two-tailed t-test table, for a 95% confidence limit, we find that t7,95% = 2.36. Since tcalc > t7,95%, we reject the null hypothesis, accept the alternate hypothesis that μprev ≠ μcurrent, and can say that the means are significantly different. Using Tables for One- and Two-Tailed Tests Some tables of critical t-values only give you the values for either a one- or two-tailed test, but not both. Because of this, you will need to know how to use onetailed tables for two-tailed tests, and vice versa. The conversion is actually quite simple: Table you have Operation To get ... One-tailed Two-tailed Divide P by 2 Multiply P by 2 Two-tailed test for P/2 One-tailed test for 2P Using Tables for One- and Two-Tailed Tests Table you have Operation To get ... One-tailed Two-tailed Divide P by 2 Multiply P by 2 Two-tailed test for P/2 One-tailed test for 2P For example, assume you have a table to a one-tail test at the 98% confidence level and want to perform a twotailed test. For the 98% confidence level, P = 0.02. Divide P by 2 to get 0.01, which is a 99% confidence level. So you would compare tcalc to the value from the 98% one-tailed table, and it would be equivalent to a two-tailed test at the 99% confidence level. Using Tables for One- and Two-Tailed Tests Table you have Operation To get ... One-tailed Two-tailed Divide P by 2 Multiply P by 2 Two-tailed test for P/2 One-tailed test for 2P Cell B12 has the t-test probability, which is a tiny 0.000065, and indicates that the difference between the before and after results is very highly significant. If we had used the normal unmatched pairs t-test, we would have obtained a P of 0.225, which is higher than 0.05, so indicates that the apparent increase in pulse rate with eating is not significant! This shows the importance of choosing the right test. Pulse Before bpm eating mean After eating 105 109 79 87 79 86 103 109 87 100 74 82 73 80 83 90 85.4 92.9 Writing a conclusion: The results of the dependent t-test can be seen in the resultant table. The value of t (t Stat) is -4.53744, which we can round off to -4.54. The probability of this result being due to chance can be read from the table as 0.000291 (two-tail) which means that this result is significant at the .0003 level. We will set our alpha level as .05, so we will say that p < .05 rather than that p = .0003. We could also look up the t critical value or cut-off value for t from the table by looking at t Critical one-tail which is 2.110 without using the spreadsheet. x P varies from 0 (not likely) to 1 (certain). The higher the probability, the more likely it is that the two sets are the same, and that any differences are just due to random chance. The lower the probability, the more likely it is that that the two sets are significantly different, and that the differences are real. Where do you draw the line between these two conclusions? In biology the critical probability is usually taken as 0.05 (or 5%). This may seem very low, but it reflects the facts that biology experiments are expected to produce quite varied results. So if P > 0.05 then the two sets are the same, and if P < 0.05 then the two sets are different. For the t test to work, the number of repeats should be as large as possible, and certainly > 5. In Excel the t test is performed using the formula: =TTEST (range1, range2, tails, type) . For the examples you'll use in biology, tails is always 2 (for a "two-tailed" test), and type can be 1 or 2 depending on the circumstances. Statistical hypotheses form the basis of statements and conclusions that we can make about sets of data. A hypothesis is a statement designed to be proven or disproven , such as "The sample means of two sets of data are statistically the same and the samples come from the same overall population." Taking the example above comparing sample means, we would define the hypothesis H H: "two sets of data (1 and 2), with sample means m1 and m2, are both part of the same population, so that their populations means are equal, μ1 = μ2." If we accept this hypothesis, we are saying that despite the fact that the samples came from two different measurements, they are part of the same overall population or that the measurement is being made on the same general system. If we reject the hypothesis, we are saying the population means are different, and that we are dealing with two separate systems. We use statistical tests of significance to determine whether to accept the hypothesis or not, and we choose the test depending on if we are comparing two or more means, standard deviations, or variances. Two tests are covered in this tutorial: the t-test and the F-test. Other tests, such as Z-tests, χ2-tests, and Analysis of Variance (ANOVA), are described in most statistics textbooks. First, we will see how to construct a hypothesis. Referring to the above example of a comparison between means, assume we want to analyze some soil to determine its arsenic content and to see if it exceeds the allowable amount. We have run a series of n = 7 tests various soil samples and find that the mean arsenic concentration is 4 ppm, with a standard deviation of s = 0.9 ppm. If the allowable limit is 2 ppm arsenic, we wish construct a hypothesis to determine whether the soil is indeed contaminated, or if the difference between the sample mean and the allowable limit is to random error. There are two possibilities: 1. The true mean of the soil arsenic concentration μ is greater than the allowable limit: μ > 2 ppm = μ0 2. The true mean of the soil arsenic concentration μ1 is the same or less than the allowable limit and any deviation is due to random error: μ1 ≤ 2 ppm To set up the hypothesis, we make what is called the null hypothesis, which says there is no difference between the means. We also set up an alernate hypothesis, which is the hypothesis we adopt if the null hypothesis is disproved. Null Hypothesis H0: μ = μ0 Alternate Hypothesis HA: μ > μ0 where μ0 = 2 ppm is the allowable limit. If our statistical test shows that the the null hypothesis is true, we conclude that the means are equal and the arsenic concentration in the soil is not above the allowable limit. If the test shows that the null hypothesis is false, then we accept the alternate hypothesis, and can conclude that the arsenic concentration in the soil is indeed above the allowable limit. The statistical test we would use in this case is the t-test, which we will explore on the next page. It is important to note that the hypothesis we just established is a one-tailed test, since we were looking at the probability that the sample mean was either "greater than", or "less than or equal to" 2 ppm. It is also possible to have a two-tailed test, where we would try to establish "equal to" or "not equal to." This concept is covered in the pages on confidence levels and one- and two-tailed tests. Evaluation of Means for small samples - The t-test In the previous example, we set up a hypothesis to test whether a sample mean was close to a population mean or desired value for some soil samples containing arsenic. On this page, we establish the statistical test to determine whether the difference between the sample mean and the population mean is significant. It is called the t-test, and it is used when comparing sample means, when only the sample standard deviation is known. QuickTime™ and a decompressor are needed to see this picture. Parametric and NonParametric Tests • Parametric Tests: Relies on theoretical distributions of the test statistic under the null hypothesis and assumptions about the distribution of the sample data (i.e., normality) • Non-Parametric Tests: Referred to as “Distribution Free” as they do not assume that data are drawn from any particular distribution