* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Inferences for a Single Population Mean
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Taylor's law wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
4 – Inference Using t-Distributions (Ch. 2 of text) 4.1 - Estimation of a Population Mean ( ) In your introductory statistics course(s) you should have examined confidence intervals as a means of estimating population parameters, e.g. the population mean (. The confidence interval for the population mean is summarized below. (100 - )% Confidence Interval for (e.g. .05 95% confidence) The basic form of a confidence interval is as follows: (estimate) + (table value) * SE(estimate) For the population mean () we have, X (t table value) SE ( X ) or X t s n The appropriate columns in the t-distribution table (Appendix A, Table A.2) for the different confidence intervals are as follows: 90% Confidence look in the .95 column (if n is “large” we can use 1.645) 95% Confidence look in the .975 column (if n is “large” we can use 1.960) 99% Confidence look in the .995 column (if n is “large” we can use 2.576) 4.2 – Review of the Basic Steps in a Hypothesis Test Before we look at hypothesis testing for a single population mean we will examine the five basic steps in a hypothesis test and introduce some important terminology and concepts. Steps in a Hypothesis Test 1. 2. 40 3. 4. 5. 41 4.3 - Hypothesis Test for a Single Population Mean ( ) Null Hypothesis ( H o ) Alternative Hypothesis ( H a ) p-value area o o o o o o Upper-tail Lower-tail Two-tailed (perform test using CI for ) Test Statistic (in general) In general the basic form of most test statistics is given by: (estimate) (hypothesized value) Test Statistic = (think “z-score”) SE (estimate) which measures the discrepancy between the estimate from our sample and the hypothesized value under the null hypothesis. Intuitively, if our sample-based estimate is “far away” from the hypothesized value assuming the null hypothesis is true, we will reject the null hypothesis in favor of the alternative or research hypothesis. Extreme test statistic values occur when our estimate is a large number of standard errors away from the hypothesized value under the null. The p-value is the probability, that by chance variation alone, we would get a test statistic as extreme or more extreme than the one observed assuming the null hypothesis is true. If this probability is “small” then we have evidence against the null hypothesis, in other words we have evidence to support our research hypothesis. Truth Type I and Type II Errors ( & ) Decision H o true H a true Reject H o Fail to Reject H o 42 Example: Testing Wells for a Perchlorate in Morgan Hill & Gilroy, CA. EPA guidelines suggest that drinking water should not have a perchlorate level exceeding 4 ppb (parts per billion). Perchlorate contamination in California water (ground, surface, and well) is becoming a widespread problem. The Olin Corp., a manufacturer of road flares in the Morgan Hill area from 1955 to 1996 was is the source of the perchlorate contamination in the this area. Suppose you are resident of the Morgan Hill area which alternative do you want well testers to use and why? H o : 4 ppb H a : 4 ppb or H o : 4 ppb H a : 4 ppb Test Statistic for Testing a Single Population Mean ( ) ~ (t-test) t X o X o ~ t-distribution with df = n – 1. or t s SE ( X ) n Assumptions: When making inferences about a single population mean we assume the following: 1. The sample constitutes a random sample from the population of interest. 2. The population distribution is normal. This assumption can be relaxed when our sample size in sufficiently “large”. How large the sample size needs to be is dependent upon how “non-normal” the population distribution is. Example 1: Length of Stay in a Nursing Home (Datafile: LOS.JMP) In the past the average number of nursing home days required by elderly patients before they could be released to home care was 17 days. It is hoped that a new program will reduce this figure. Do these data support the research hypothesis? 3 5 12 7 22 6 2 18 9 8 20 15 3 36 38 43 43 Normality does not appear to be satisfied here! Notice the CI for the mean length of stay is (8.38 days, 22.49 days). Hypothesis Test: Ho : 1) HA : 2) Choose Test statistic 3) Compute test statistic 4) Find p-value (use t-Probability Calculator.JMP) 44 5) Make decision and interpret To perform a t-test in JMP, select Test Mean from the LOS pull-down menu and enter value for mean under the null hypothesis,17.0 in this example. Conclusion: In JMP The hypothesized mean is the value for the population mean under the null hypothesis. If normality is questionable or the same size is small a nonparametric test may be more appropriate. We will discuss nonparametric tests later in the course. The graphic on the left is obtained by selecting P-value animation from the pulldown menu next to Test Mean=value. * click Low Side for a lower-tail test, similarly for the other two types of alternatives. 45 4.4 - Comparing Two Population Means Using Dependent or Paired Samples (Section 2.2.4 pgs. 35-37) When using dependent samples each observation from population 1 has a one-to-one correspondence with an observation from population 2. One of the most common cases where this arises is when we measure the response on the same subjects before and after treatment. This is commonly called a “pre-test/post-test” situation. However, sometimes we have pairs of subjects in the two populations meaningfully matched on some prespecified criteria. For example, we might match individuals who are the same race, gender, socio-economic status, height, weight, etc... to control for the influence these characteristics might have on the response of interest. When this is done we say that we are “controlling for the effects of race, gender, etc...”. By using matched-pairs of subjects we are in effect removing the effect of potential confounding factors, thus giving us a clearer picture of the difference between the two populations being studied. DATA FORMAT Matched Pair X 1i 1 2 3 ... n X 2i X 11 X 21 X 12 X 22 X 13 X 23 ... ... X 1n X 2 n d i X 1i X 2i d1 d2 d3 ... dn For the sample paired differences ( d i ' s ) find the sample mean (d ) and standard deviation ( s d ) . The general hypotheses are H o : d o H a : d o or H a : d o or H a : d o Note: While 0 is usually used as the hypothesized mean difference under the null, we actually can hypothesize any size difference for the mean of the paired differences that we want. For example if wanted to show a certain diet resulted in at least a 10 lb. decrease in weight then we could test if the paired differences: d = Initial weight – After diet weight had mean greater than 10 ( H a : d 10 lbs. ) Test Statistic for a Paired t-Test (estimate of mean paired difference) - (hypothesized mean difference) t SE(estimate) d o ~ t - distributi on with df n - 1 sd n where o the hypothesized value for the mean paired difference under the null hypothesis. 100(1- )% CI for d s where t comes from the appropriate quantile of t-distribution df = n – 1. d t d n This interval has a 100(1- )% chance of covering the true mean paired difference. 46 Example: Effect of Captopril on Blood Pressure (Datafile: Blood.JMP) In order to estimate the effect of the drug Captopril on blood pressure (both systolic and diastolic) the drug is administered to a random sample n = 15 subjects. Each subjects blood pressure was recorded before taking the drug and then 30 minutes after taking the drug. The data are shown below. Syspre – initial systolic blood pressure Syspost – systolic blood pressure 30 minutes after taking the drug Diapre – initial diastolic blood pressure Diapost – diastolic blood pressure 30 minutes after taking the drug Research Questions: Is there evidence to suggest that Captopril results in a systolic blood pressure decrease of at least 10 mmHg on average in patients 30 minutes after taking it? Is there evidence to suggest that Captopril results in a diastolic blood pressure decrease of at least 5 mmHg on average in patients 30 minutes after taking it? For each blood pressure we need to consider paired differences of the form d i BPpre i BPpost i . For paired differences defined this way, positive values correspond to a reduction in their blood pressure ½ hour after taking Captopril. To answer research questions above we need to conduct the following hypothesis tests: H o : syspre syspost 10 mmHg and H o : diaprediapost 5 mmHg H a : syspre syspost 10 mmHg H a : diaprediapost 5 mmHg Below are the relevant statistical summaries of the paired differences for both blood pressure measurements. The t-statistics for both tests are given below on the following page. 47 Systolic BP Diastolic BP We can use the t-Probability Calculator in JMP to find the associated p-values or better yet use JMP to conduct the entire t-test. Systolic Blood Pressure Diastolic Blood Pressure Both tests result in rejection of the null hypotheses. This we have sufficient evidence to suggest that taking Captopril will result in mean decrease in systolic blood pressure exceeding 10 mmHg (p = _______) and a mean decrease in diastolic blood pressure exceeding 5 mmHg (p = _______). Furthermore we estimate that the mean change in systolic blood pressure will be somewhere between _______ mmHg and ______ mmHg, and that the mean change in diastolic blood pressure could be as large as ______ mmHg. 48 4.5 – Comparing Two Pop. Means Using Independent Samples (Section 2.3 pgs. 37 – 44) Example 1: Prior Knowledge of Instructor and Lecture Rating (Datafile: Instructor Rating Study) How powerful are rumors? Frequently, students ask friends and/or look at instructor evaluations to decide if a class is worth taking. Kelley (1950) found that instructor reputation has a profound impact on actual teaching ratings. Towler and Dipboye (1998) replicated and extended this study by asking: “Does an instructor's prior reputation affect student ratings?” Towler, A., & Dipboye, R. L. (1998). “The effect of instructor reputation and need for cognition on student behavior” Experimental Design: Subjects were randomly assigned to one of two conditions. Before viewing the lecture, students were give a summary of the instructors prior teaching evaluations. There were two conditions: Charismatic instructor and Punitive instructor. Summary given in the "Charismatic instructor" condition: Frequently at or near the top of the academic department in all teaching categories. Professor S was always lively and stimulating in class, and commanded respect from everyone. In class, she always encouraged students to express their ideas and opinions, however foolish or half-baked. Professor S was always innovative. She used differing teaching methods and frequently allowed students to experiment and be creative. Outside the classroom, Professor S was always approachable and treated students as individuals. Summary given in the "Punitive instructor" condition: Frequently near the bottom of the academic department in all important teaching categories. Professor S did not show an interest in students' progress or make any attempt to sustain student interest in the subject. When students asked questions in class, they were frequently told to find the answers for themselves. When students felt they had produced a good piece of work, very rarely were they given positive feedback. In fact, Professor S consistently seemed to grade students harder than other lecturers in the department. Then all subjects watched the same twenty-minute lecture given by the exact same lecturer. Following the lecture, subjects rated the lecturer. Subjects answered three questions about the leadership qualities of the lecturer. A summary rating score was computed and used as the variable "rating" here. 49 Research Question: Does an instructor prior reputation affect student ratings of a lecture given by a professor? Summary Statistics xC 2.613 x P 2.236 s C .533 s P ..543 nC 25 n P 24 Intuitive Decision In order to determine whether or not the null or alternative hypothesis is true, you could review the summary statistics for the variable you are interested in testing across the two groups. Remember, these summary statistics and/or graphs are for the observations you sampled, and to make decisions about all observations of interest, we must apply some inferential technique (i.e. hypothesis tests or confidence intervals) One of the best graphical displays for this situation is the side-by-side boxplots. To get side-by-side boxplots, select Analyze > Fit Y by X. Place Prior Info in the X box and Rating in the Y box. Place mean diamonds & histograms on the plot, and we may also want to jitter the points. The more separation there is in the mean diamonds, the more likely we are to reject the null hypothesis (i.e data tends to support the alternative hypothesis). To answer the question of interest formally we need inferential tools for comparing the mean rating given to a lecture when students are told the professor is a charismatic individual vs. mean rating given when students are given the punitive instructor prior opinion, i.e. compare charismatic to punitive. 50 Hypothesis Testing ( 1 vs. 2 ) The general null hypothesis says that the two population means are equal, or equivalently there difference is zero. The alternative or research hypothesis can be any one of the three usual choices (upper-tail, lower-tail, or two-tailed). For the two-tailed case we can perform the test by using a confidence interval for the difference in the population means and determining whether 0 is contained in the confidence interval. H o : 1 2 or equivalently ( 1 2 ) hypothesized difference (typically 0) H a: 1 2 or equivalently ( 1 2 ) hypothesized difference (upper - tail) or H a : 1 2 or equivalently ( 1 2 ) hypothesized difference (two - tailed, USE CI! ) etc.... Test Statistic t ( X 1 X 2 ) (hypothesized difference) ~ t-distribu tion with appropriat e degrees of freedom SE ( X 1 X 2 ) where the SE ( X 1 X 2 ) and degrees of freedom for the t-distribution comes from one of the two cases described below. Confidence Interval for the Difference in the Population Means 100(1 - )% Confidence Interval for ( 1 2 ) ( X 1 X 2 ) t SE ( X 1 X 2 ) where t comes from t-table with appropriate degrees of freedom (see two cases below). Case 1 ~ Equal Populations Variances/Standard Deviations 2 2 ( 1 2 = 2 common variance to both populations) Rule of Thumb for Checking Variance Equality If the larger sample variance is more than twice the smaller sample variance do not assume the variances are equal. Assumptions: For this case we make the following assumptions 1. The samples from the two populations were drawn independently. 2. The population variances/standard deviations are equal. 3. The populations are both normally distributed. This assumption can be relaxed when the samples from both populations are “large”. 51 Case 1 – Equal Variances (cont’d) Assuming the assumptions listed above are all satisfied we have the following for the standard error of the difference in the sample means. 1 2 1 SE ( X 1 X 2 ) s p n1 n2 where (n 1) s1 (n2 1) s 2 1 n1 n 2 2 2 sp 2 2 if n1 n2 s 2p s12 s 22 if n1 n2 2 s p is called the “pooled estimate of the common variance ( 2 ) ”. The degrees of 2 freedom for the t-distribution in this case is df n1 n2 2 . Example 1: Prior Knowledge of Instructor and Lecture Rating (cont’d) Case 1 – Equal Variances To perform the “pooled t-Test” select the Means/Anova/Pooled t option from the Oneway Analysis pull-down menu. Case 2 – Unequal Variances If you do not want the to assume the population variances are equal then select the t Test option. To formally test whether we can assume the population variances are equal select UnEqual Variances from pull-down menu. 52 t-Test Results from JMP Discussion: In the previous example we chose to use a pooled t-test assuming the population variances were equal based upon the visual evidence and applying the “rule of thumb”. To formally test this assumption, choose the UnEqual Variances option from the Oneway Analysis pull-down menu. The results are shown below. Interpretation of Results 53 Example 2: Normal Human Body Temperatures Females vs. Males (Datafile: Bodytemp.JMP) Do men and women have the same normal body temperature? Putting this into a statement involving parameters that can be tested: H o : F M or ( F M ) 0 H a : F M or ( F M ) 0 F mean body temperature for females. M mean body temperature for males. Assumptions 1. The two groups must be independent of each other. 2. The observation from each group should be normally distributed. 3. Decide whether or not we wish to assume the population variances are equal. Checking Assumptions Assessing Normality of the Two Sampled Populations (Assumption 2) To assess normality we select Normal Quantile Plot from the Oneway Analysis pulldown menu as shown below. Normality appears to be satisfied here. 54 Checking the Equality of the Population Variances To test the equality of the population variances select Unequal Variances from the Oneway Analysis pull-down menu. The test is: Ho : F M Ha : F M JMP gives four different tests for examining the equality of population variances. To use the results of these tests simply examine the resulting p-values. If any/all are less than .10 or .05 then worry about the assumption of equal variances and use the unequal variance tTest instead of the pooled t-Test. p-values for testing variances 55 Example 2: Normal Human Body Temperatures Females vs. Males (cont’d) To perform the two-sample t-Test for independent samples: assuming equal population variances select the Means/Anova/Pooled t option from Oneway-Analysis pull-down menu. assuming unequal population variances select t-Test from the Oneway-Analysis pull-down menu. Because we have no evidence against the equality of the population variances assumption we will use a pooled t-Test to compare the population means. Several new boxes of output will appear below the graph once the appropriate option has been selected, some of which we will not concern ourselves with. The relevant box for us will be labeled t-Test is shown below for the mean body temperature comparison. Because we have concluded that the equality of variance assumption is reasonable for these data we can refer to the output for the t-Test assuming equal variances. What is the test statistic value for this test? What is the p-value? What is your decision for the test? Write a conclusion for your findings. 56 Interpretation of the CI for ( F M ) Case 2 - Unequal Populations Variances/Standard Deviations ( 1 2 ) Assumptions: For this case we make the following assumptions 1. The samples from the two populations were drawn independently. 2. The population variances/standard deviations are NOT equal. (This can be formally tested or use rule o’thumb) 3. The populations are both normally distributed. This assumption can be relaxed when the samples from both populations are “large”. Test Statistic t (X1 X 2 ) 0 ~ t-distribution with df = (see formula below) SE ( X 1 X 2 ) where the SE ( X 1 X 2 ) is as defined below. 100(1 - )% Confidence Interval for ( 1 2 ) ( X 1 X 2 ) t SE ( X 1 X 2 ) where 2 SE ( X 1 X 2 ) 2 s1 s 2 n1 n2 and df s1 2 s 2 2 n n 2 1 2 rounded down to the nearest integer 2 2 s1 2 s2 2 n 1 n2 n1 1 n2 1 The t-quantiles are the same as those we have seen previously. 57 Example: Cell Radii of Malignant vs. Benign Breast Tumors (Datafile: Breast-Diag.JMP) These data come from a study of breast tumors conducted at the University of WisconsinMadison. The goal was determine if malignancy of a tumor could be established by using shape characteristics of cells obtained via fine needle aspiration (FNA) and digitized scanning of the cells. The sample of tumor cells were examined under an electron microscope and a variety of cell shape characteristics were measured. One of the goals of the study was to determine which cell characteristics are most useful for discriminating between benign and malignant tumors. The variables in the data file are: ID - patient identification number (not used) Diagnosis determined by biopsy - B = benign or M = malignant Radius = radius (mean of distances from center to points on the perimeter Texture texture (standard deviation of gray-scale values) Smoothness = smoothness (local variation in radius lengths) Compactness = compactness (perimeter^2 / area - 1.0) Concavity = concavity (severity of concave portions of the contour) Concavepts = concave points (number of concave portions of the contour) Symmetry = symmetry (measure of symmetry of the cell nucleus) FracDim = fractal dimension ("coastline approximation" - 1) Medical literature citations: W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 163-171. W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Analytical and Quantitative Cytology and Histology, Vol. 17 No. 2, pages 77-87, April 1995. W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computerized breast cancer diagnosis and prognosis from fine needle aspirates. Archives of Surgery 1995;130:511-516. W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computer-derived nuclear features distinguish malignant from benign breast cytology. Human Pathology, 26:792--796, 1995. See also: http://www.cs.wisc.edu/~olvi/uwmp/mpml.html http://www.cs.wisc.edu/~olvi/uwmp/cancer.html In this example we focus on the potential differences in the cell radius between benign and malignant tumor cells. 58 The cell radii of the malignant tumors certainly appear to be larger than the cell radii of the benign tumors. The summary statistics support this with sample means/medians of rough 17 and 12 units respectively. The 95% CI’s for the mean cell radius for the two tumor groups do not overlap, which further supports a significant difference in the cell radii exists. Testing the Equality of Population Variances 59 Because we conclude that the population variances are unequal we should use the nonpooled version to the two-sample t-test. No one does this by hand, so we will use JMP. Conclusion: 60