* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 9.3 Tests for a Single Mean - LISA (Virginia Tech`s Laboratory for
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Psychometrics wikipedia , lookup
Omnibus test wikipedia , lookup
Analysis of variance wikipedia , lookup
Misuse of statistics wikipedia , lookup
T-tests and ANOVA using JMP Kristopher Patton April 7, 2015 *http://gipedu.org/virginia-polytechnicinstitute-state-university-virginia-tech/ Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use of Statistics Designing Experiments • Analyzing Data • Interpreting Results Grant Proposals • Using Software (R, SAS, JMP, Minitab...) Collaboration Walk-In Consulting From our website request a meeting for personalized statistical advice OSB 103: Mon. – Fri. from 1:00 to 3:00 GLC Room A: Tues., Thurs., Fri. from 10:00 to 12:00 Hutcheson 403-J: Wed. from 10:00 to 12:00 Great advice right now: Meet with LISA before collecting your data Short Courses Designed to help graduate students apply statistics in their research All services are FREE for VT researchers. We assist with research—not class projects or homework. www.lisa.stat.vt.edu 3 Hypothesis Test A hypothesis test is a detailed protocol for decision-making concerning a population by examining a sample from that population. 4 Hypothesis Tests vs. Criminal Trials Burden of Proof—Obligation to shift the conclusion using evidence Hypothesis Test Assume the initial hypothesis is true until the data suggests otherwise Trial Innocent until proven guilty 5 Steps in a Hypothesis Test 1. Test 2. Assumptions 3. Hypotheses 4. Mechanics 5. Conclusion 6 One Sample t-Test • Used to test whether the population mean is different from a specified value. 7 Medical Example • In a glaucoma study, the following intraocular pressure (mm Hg) values were recorded from a sample of 21 elderly subjects. Based on this data, can we conclude that the mean intraocular pressure of the population from which the sample was drawn differs from 14 mm Hg?* Intraocular Pressure 14.5 12.9 14 16.1 12 17.5 14.1 12.9 17.9 12 16.4 24.2 12.2 14.4 17 10 18.5 20.8 16.2 14.9 19.6 𝑦 = 15.6238 𝑠 = 3.383 *Wayne, D. Biostatistics: A Foundation for Analysis in the Health Sciences. 5th ed. New York: John Wiley & Sons, 1991. Assumptions • The data are randomly sampled from the population. • The data are approximately normally distributed. • Our data are representative of the variable of interest, which is also referred to as the response variable. Hypotheses • The “null hypothesis” is a statement describing a claim about a population constant. - The null hypothesis is denoted as 𝑯𝟎 . • The “alternative hypothesis” is a statement describing the researcher’s suspicions about the claim. Also called “research hypothesis”. - The alternative hypothesis is denoted as 𝑯𝒂 . Medical Example hypotheses: 𝐻0 : 𝜇 = 14 𝑣𝑠 𝐻𝑎 : 𝜇 ≠ 14 Hypotheses • For hypothesis testing there are three versions for testing that are determined by the context of the research question. • Left Tailed Hypothesis Test (less than) • Right Tailed Hypothesis Test (greater than) • Two Tailed or Two Sided Hypothesis Test (not equal to) Mechanics • Rejection Rule: Reject the null hypothesis (𝑯𝟎 ) if the p-value ≤ 𝜶 • Test Statistic: Compute the test statistic, which is a standardization of the sample mean, and is needed for the p-value computation. • P-value: The chance of observing your sample results or more extreme results assuming that the null hypothesis is true. If this chance is “small” then you may decide the claim in the null hypothesis is false. 12 Test Statistic for Medical Example •In many cases, including Example 1, the population standard deviation 𝝈 is unknown because it is a parameter from the population that must be estimated. •The best estimate for 𝝈 is 𝒔. • Our standardized value becomes 𝒕𝒐𝒃𝒔 𝝁𝟎 : hypothesized mean 𝒚: sample mean 𝑠: sample standard deviation 𝒏: sample size 𝑡𝒐𝒃𝒔 : observed t test statistic Test statistic for a one sample t-test 𝒚 − 𝝁𝟎 = 𝒔 ~𝒕𝒏−𝟏 𝒏 This t observed (𝑡0𝑏𝑠 ) test statistic follows a t distribution with 𝒏 − 𝟏 degrees of freedom. Test Statistic for Medical Example • In the example it was given that 𝒚 = 𝟏𝟓. 𝟔𝟐𝟑𝟖 and 𝒔 = 𝟑. 𝟑𝟖𝟑. 𝒕𝒐𝒃𝒔 𝒚 − 𝝁𝟎 𝟏𝟓. 𝟔𝟐𝟑𝟖 − 𝟏𝟒 = = = 𝟐. 𝟐𝟎 𝒔/ 𝒏 𝟑. 𝟑𝟖𝟑/ 𝟐𝟏 P-value • The p-value is determined based on the sign of the alternative hypothesis. 1. 𝑯𝒂 : 𝝁 ≠ 𝝁𝟎 . If this is the case, then the p-value is the area in both tails of the t distribution. 0.4 Density 0.3 0.2 0.1 1/2 p-value 0.0 1/2 p-value -t_obs 0 t_obs P-value • The p-value is determined based on the sign of the alternative hypothesis. 2. 𝑯𝒂 : 𝝁 < 𝝁𝟎 . If this is the case, then the p-value is the area to the left of the observed test statistic. 0.4 p-value Density 0.3 0.2 0.1 0.0 0 t_obs P-value • The p-value is determined based on the sign of the alternative hypothesis. 3. 𝑯𝒂 : 𝝁 > 𝝁𝟎 . If this is the case, then the p-value is the area to the right of the observed test statistic. 0.4 Density 0.3 0.2 0.1 p-value 0.0 0 t_obs Medical Example • 𝑯𝟎 : 𝝁 = 𝟏𝟒 𝒗𝒔. 𝑯𝒂 : 𝝁 ≠ 𝟏𝟒 • P−𝐯𝐚𝐥𝐮𝐞 = 𝟎. 𝟎𝟏𝟗𝟖𝟔 + 𝟎. 𝟎𝟏𝟗𝟖𝟔 = 𝟎. 𝟎𝟑𝟗𝟕𝟐 0.4 Density 0.3 0.2 0.1 0.01986 0.0 0.01986 -2.2 0 t 2.2 Conclusion • Conclusions should always include: • Decision: reject or fail to reject (not accept 𝐻0 ). • Context: what your decision means in context of the problem. • Medical Example: With a p-value=0.0398, which is less than 0.05, we reject 𝐻0 . There is sufficient sample evidence to conclude that the true mean intraocular pressure differs from 14 mm Hg. 19 Summary of One Sample t-test 2-Tailed Test Right-Tailed Left Tailed Null hypothesis 𝐻0 : 𝜇 = 𝜇0 𝐻0 : 𝜇 ≤ 𝜇0 𝐻0 : 𝜇 ≥ 𝜇0 Alternative hypothesis 𝐻𝑎 : 𝜇 ≠ 𝜇0 𝐻𝑎 : 𝜇 > 𝜇0 𝐻𝑎 : 𝜇 < 𝜇0 • Test Statistic: • 𝒕𝒐𝒃𝒔 = 𝒚−𝝁𝟎 𝒔 𝒏 • Degrees of Freedom: 𝒏 − 𝟏 • Assumption: The population from which the sample is drawn is normal or approximately normal. 20 Importing Data into JMP *http://nuke.progettiesistemi.com/Simpl eBusiness/tabid/97/Default.aspx 21 Egyptian Skulls Data Set • Four measurements of male Egyptian skulls from 5 different time periods. Thirty skulls are measured from each time period. • Variables • MB: Maximal Breadth of Skull • BH: Basibregmatic Height of Skull • BL: Basialveolar Length of Skull • NH: Nasal Height of Skull • Year: Approximate Year of Skull Formation • (negative = B.C., positive = A.D.) *Thomson, A. and Randall-Maciver, R. (1905) Ancient Races of the Thebaid, Oxford: Oxford University Press. *http://members.ozemail.com.au/~rdun lop/CoplandMain/MathsLG/CollandEnt DataLG.htm 22 Hypothesis Test for a Single Mean in JMP • JMP Demonstration • Open data set. • AnalyzeDistribution • Complete the dialog box as shown and select OK. • Select the red arrow next to “Pressure” and select Test Mean. • Complete Dialog box as shown and select OK. • Select the red arrow next to “Pressure” and select Confidence Interval->0.95. 23 Two Sample T-Test • The major goal is to determine whether a difference exists between two populations. • Examples: • Compare blood pressure for male and females. • Compare the proportion of smokers and nonsmokers with lung cancer. • Compare weight before and after treatment. • Is the mean cholesterol of people taking drug A lower than the mean cholesterol of people taking drug B? 24 Hypotheses for 2 Samples • The population means of the two groups are not equal. H 0: μ 1 = μ 2 H a: μ 1 ≠ μ 2 The population mean of group 1 is greater than the population mean of group 2. H 0: μ 1 = μ 2 H a: μ 1 > μ 2 The population mean of group 1 is less than the population mean of group 2. H 0: μ 1 = μ 2 H a: μ 1 < μ 2 25 Two Sample Assumptions • The two samples are random and independent. • The populations from which the samples are drawn are approximately normal. • The populations have the same standard deviation. 26 Test Statistic for TWO Samples 𝑦𝟏 − 𝑦𝟐 𝒕𝒐𝒃𝒔 = 𝒔𝒑 𝒔𝒑 = 𝟏 𝟏 + 𝒏𝟏 𝒏𝟐 𝒏𝟏 − 𝟏 𝒔𝟐𝟏 + 𝒏𝟐 − 𝟏 𝒔𝟐𝟐 𝒏𝟏 + 𝒏𝟐 − 𝟐 • Upon calculation of the test-statistic, we can then calculate the p-value and draw our conclusion. 27 Summary: Two Sample t-Test 2-Tailed Test Right-Tailed Left Tailed Null 𝐻0 : 𝜇1 − 𝜇2 = 0 𝐻0 : 𝜇1 − 𝜇2 ≤ 0 𝐻0 : 𝜇1 − 𝜇2 ≥ 0 Alternative 𝐻𝑎 : 𝜇1 − 𝜇2 ≠ 0 𝐻𝑎 : 𝜇1 − 𝜇2 > 0 𝐻𝑎 : 𝜇1 − 𝜇2 < 0 • Test Statistic: 𝒕𝒐𝒃𝒔 = 𝒔𝒑 = 𝑦𝟏 − 𝑦𝟐 𝟏 𝟏 𝒔𝒑 𝒏 + 𝒏 𝟏 𝟐 Degrees of Freedom n1 + n2 − 2 𝒏𝟏 − 𝟏 𝒔𝟐𝟏 + 𝒏𝟐 − 𝟏 𝒔𝟐𝟐 𝒏𝟏 + 𝒏𝟐 − 𝟐 Assumption: The populations from which both samples are drawn are normal or approximately normal. 28 VA Lung Cancer Data Set • Veteran's Administration lung cancer trial. • Variables • stime: Survival of follow-up time in days. • status: Dead or Censored. • treat: Treatment type of either Standard or Test. • age: Patient’s age in years. • Karn: Karnofsky score of patient's performance on a scale of 0 (dead) to 100 (perfectly normal). • diag.time: Time since diagnosis in months at entry to the trial. • cell: One of four cell types. • prior: Did the patient receive prior therapy? *Kalbfleisch, J.D. and Prentice R.L. (1980) The Statistical Analysis of Failure Time Data. Wiley. *http://lungcancernewst oday.com/2015/03/05/f da-grants-licensingapplication-to-opdivofor-the-treatmentadvanced-squamousnsclc/ 29 JMP • JMP Demonstration: Analyze Fit Y By X Y, Response: Karnofsky Score (Karn) X, Factor: Treatment (treat) Select: Means/ANOVA/Pooled t 30 Paired t-Test • The objective of paired comparisons is to minimize sources of variation that are not of interest in the study by pairing observations with similar characteristics. • Example: A researcher would like to determine if background noise causes people to take longer to complete math problems. The researcher gives 20 subjects two math tests one with complete silence and one with background noise and records the time each subject takes to complete each test. 31 Hypotheses for Paired t-Test • The population mean difference is not equal to zero. H0: μdifference = 0 Ha: μdifference ≠ 0 • The population mean difference is greater than zero. H0: μdifference = 0 Ha: μdifference > 0 • The population mean difference is less than a zero. H0: μdifference = 0 Ha: μdifference < 0 32 Assumptions for Paired t-Test • The sample is random. • The data is matched pairs. • The differences have a normal distribution. 33 Test Statistic for Paired t-Test • Test Statistic: 𝒕𝒐𝒃𝒔 𝒚𝒅 = 𝒔 𝒅 𝒏 Where 𝑦𝑑 bar is the mean of the differences and sd is the standard deviations of the differences. • Upon calculation of the test-statistic, we can then calculate the p-value and draw our conclusion. 34 Summary of Paired t-Test 2-Tailed Right Tailed Left Tailed Null 𝐻0 : 𝜇𝑑 = 0 𝐻0 : 𝜇𝑑 ≤ 0 𝐻0 : 𝜇𝑑 ≥ 0 Alternative 𝐻𝑎 : 𝜇𝑑 ≠ 0 𝐻𝑎 : 𝜇𝑑 > 0 𝐻𝑎 : 𝜇𝑑 < 0 • Test Statistic: 𝒕𝒐𝒃𝒔 𝒚𝒅 = 𝒔 𝒅 𝒏 • Degrees of Freedom: 𝒏 − 𝟏 Assumption: The population of differences is normal or approximately normal. 35 Egyptian Skulls Data Set • Four measurements of male Egyptian skulls from 5 different time periods. Thirty skulls are measured from each time period. • Variables • MB: Maximal Breadth of Skull • BH: Basibregmatic Height of Skull • BL: Basialveolar Length of Skull • NH: Nasal Height of Skull • Year: Approximate Year of Skull Formation • (negative = B.C., positive = A.D.) *Thomson, A. and Randall-Maciver, R. (1905) Ancient Races of the Thebaid, Oxford: Oxford University Press. *http://members.ozemail.com.au/~rdun lop/CoplandMain/MathsLG/CollandEnt DataLG.htm 36 Paired T-Test Example • JMP Analysis: • Create a new column of Diff = MB – BH • Analyze Distribution • Y, Columns: Diff • Test Mean • Specify Hypothesized Mean: 0 37 One-Way ANOVA • ANOVA is used to determine whether three or more populations have different distributions. A B C Medical Treatment 38 ANOVA Strategy • The first step is to use the ANOVA F test to determine there are any significant differences among the population means. • If the ANOVA F test shows that the population means are not all the same, then follow up tests can be performed to see which pairs of population means differ. 39 One-Way ANOVA Model yij i ij Where yij is the response of the jth trial on the ith factor level i is the mean of the ith group ij ~ N (0, 2 ) i 1,, r j 1, , ni In other words, for each group the observed value is the group mean plus some random variation. 40 One-Way ANOVA Hypothesis • Test whether there is a difference in the population means. H 0 : 1 2 r H a : The i are not all equal. 41 ANOVA Assumptions • The samples are random and independent of each other. • The populations are normally distributed. • The populations all have the same standard deviations. • The ANOVA F test is robust to the assumptions of normality and equal standard deviations. 42 Step 3: ANOVA F Test A B C A B C Medical Treatment Compare the variation within the samples to the variation between the samples. 43 ANOVA Test Statistic F Variation between Groups MSG Variation within Groups MSE Variation within groups small compared with variation between groups → Large F Variation within groups large compared with variation between groups → Small F 44 MSG • The mean square for groups, MSG, measures the variability of the sample averages. • SSG stands for sums of squares groups. • r = “# of groups” SSG r -1 n1 ( y1 y ) 2 n 2 ( y2 y ) 2 n r ( y1 y ) 2 r -1 MSG 45 MSE • Mean square error, MSE, measures the variability within the groups. SSE stands for sums of squares error. • n = “total # of observations” • SSE n-r (n 1 - 1)s12 (n 2 - 1)s 22 (n r - 1)s 2r n-r Where MSE ni si (y j 1 ij yi ) ni 1 46 ANOVA in JMP • JMP demonstration • Analyze Fit Y By X • Y, Response: MB • X, Factor: Year (change to nominal) Normal Quantile Plot Plot Actual by Quantile Means/ANOVA 47 Follow-Up Test • If the F-test results in a significant p-value, we can then use Tukey’s HSD Test to determine which pairs of groups are significant! 48 Tukey Tests • Tukey’s test simultaneously tests H 0 : i i ' H a : i i ' for all pairs of factor levels. • JMP demonstration: • Oneway ANOVA Compare Means All Pairs, Tukey HSD 49 Two-Way ANOVA • We are interested in the effect of two categorical factors on the response. • We are interested in whether either of the two factors have an effect on the response and whether there is an interaction effect. • An interaction effect means that the effect on the response of one factor depends on the level of the other factor. 50 Interaction Interaction No Interaction Low High Dosage Drug A Drug B Improvement Improvement Drug A Drug B Low High Dosage 51 Two-Way ANOVA Model yijk i j ( ) ij ijk Where yijk is the response of the kth trial on the ith factor A level and the jth factor B level is the overall mean i is the main effect of the ith level of factor A j is the main effect of the jth level of factor B ( ) ij is the interactio n effect of the ith level of factor A and the jth level of factor B ijk ~ N (0, 2 ) i 1, , a j 1, , b k 1,..., nij 52 VA Lung Cancer Data Set • Veteran's Administration lung cancer trial. • Variables • stime: Survival of follow-up time in days. • status: Dead or Censored. • treat: Treatment type of either Standard or Test. • age: Patient’s age in years. • Karn: Karnofsky score of patient's performance on a scale of 0 (dead) to 100 (perfectly normal). • diag.time: Time since diagnosis in months at entry to the trial. • cell: One of four cell types. • prior: Did the patient receive prior therapy? *Kalbfleisch, J.D. and Prentice R.L. (1980) The Statistical Analysis of Failure Time Data. Wiley. *http://lungcancernewst oday.com/2015/03/05/f da-grants-licensingapplication-to-opdivofor-the-treatmentadvanced-squamousnsclc/ 53 Two-Way ANOVA in JMP • JMP demonstration • Analyze Fit Model • Y: Karn • Highlight treat and status and click Macros Factorial to Degree • Run Model 54 Acknowledgements • Tonya Pruitt, LISA Administrative Specialist, VT Department of Statistics • Dr. Chris Franck, Assistant Research Professor, VT Department of Statistics • Dr. Anne Ryan Driscoll, Assistant Research Professor, VT Department of Statistics