Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Confidence interval wikipedia , lookup
German tank problem wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
5 – Review of Basic Statistical Inference 5.1 – Review of the Sampling Distribution Concept with Applications and Examples When take a sample of size n from a population and calculate summary statistics like the sample mean (X ) , the sample median (Med), the sample variance ( s 2 ), the sample standard deviation (s), or the sample proportion ( p̂ ) we must realize that these quantities will _________________________________________________________________ and hence are themselves ________________________________________. Any random variable in statistics has a probability distribution that determines the likelihood of certain values of the random variable being obtained. The distribution of a summary statistic, e.g. the sample mean (X ) is called the ______________________________________. In this handout we explore the sampling distributions of the sample mean ( X ) and the sample proportion ( p̂ ). 5.1.1 - Sampling Distribution of X and Applications The sample mean ( X ) is a random quantity that varies from sample to sample. The probability distribution the sample mean follows is called the sampling distribution of X . The sampling distribution demo I showed in class is found at the following web address: http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/ 45 The Central Limit Theorem (CLT) ~ tells us about the sampling distributions of the sample mean ( X ). There is also a version (which we will see later) that tells us about the sampling distribution of the sample proportion ( p̂ ) . The CLT for X says the following: 1. 2. 3. The sampling distribution will be ___________ if either of the conditions below are met: or if We now consider applications of the central limit theorem (CLT). Applications to Decision Making Example 1: Cholesterol levels of adult males (50-60 yrs. old) The mean blood cholesterol level of adult males (50-60 yrs. old) is 200 mg/dl with a standard deviation of 30 mg/dl. Assume also that blood cholesterol levels are approximately normally distributed in this population. a) What can be said about the sampling distribution of the sample mean ( X ) when drawing a sample of size n = 25 from this population? b) Give a range of values that we would expect the sample mean ( X ) to fall approximately 95% of the time. 46 c) Suppose we took sample of adult males between the ages of 50 – 60 who are also strict vegetarians and obtained sample mean of X 182 mg/dl. Does this provide evidence that the subpopulation of vegetarians have a lower mean cholesterol level that the greater population of men in this age group? Explain. Example 2: S/R Ratio The objectives of a study by Skjelbo et al. (1996) were to examine (a) the relationship between chloroguanide metabolism and efficacy in malaria prophylaxis and (b) the mephenytoin metabolism and its relationship to chloroguanide metabolism among Tanzanians. From information provided by urine specimens from the n = 216 subjects, the investigators computed the ratio of unchanged S-mephenytoin to R-mephenytoin (S/R ratio). Is there evidence that the S/R ratio of vaccinated Tanzanians is greater than .275? 47 Confidence Intervals for the Population Mean Example: Suppose we are trying to estimate the birth weight of infants born to women who smoke during pregnancy. A sample of n = 73 women who smoked during pregnancy and the birth weight of their baby was obtained yielding a sample mean of X 6.08 lbs.. This is called a _____________________ for the population mean () because it yields a single value for this unknown quantity. A better estimate might be 6.08 lbs. give or take _____ lbs., i.e. ______ up to _______. This is called an __________________________ as it gives a range or interval of plausible values for the population mean. How do we know this if this a good interval estimate? What properties should a good interval estimate have? 1) 2) The central limit theorem states that if our sample size (n) is sufficiently large, then X X ~ N ( , ) which also says by standardizing that Z ~ N (0,1) n n This means that when we collect our data the probability our observed sample mean will fall within two standard errors of the mean is approximately .95 or a 95% chance, or more precisely X P(2 Z 2) P(2 2) P(2 X 2 ) n n n P( 2 X 2 ) .9544 n n To make this 95% exactly, we simply use 1.96 in place of 2.00 in the expression above, because P(-1.96 < Z < 1.96) = .9500. For 99% confidence we use ________ and for 90% we use ________ in place of 1.96. Starting with the statement, P(1.96 X 1.96) .9500 n we can perform similar algebraic manipulations to those above to isolate the population mean in the middle of the inequality instead. By doing this we will obtain an interval that has an approximate 95% chance of covering the true population mean (. 48 This says that the interval from X 1.96 up to X 1.96 has a 95% chance of n n covering the true population mean . This interval is simply the sample mean plus or minus roughly two standard errors. However, this interval cannot be calculated in practice! WHY? A “simple fix” to this would be replace ____ by the estimated standard deviation from our data _____. The problem with our “simple fix” is that the distribution of X is not a standard s n normal, i.e. N(0,1)!!! FACT: If the population we are sampling from is approximately normal then X has a t-distribution with degrees of freedom df = n – 1. s n What does a t-distribution look like? Facts about the t-distribution: Examples: Using the t-table to find confidence intervals a) n = 20 and 95% confidence t = b) n = 20 and 99% confidence t = c) n = 50 and 90% confidence t = d) n = 10 and 95% confidence t = 49 The basic form of most confidence intervals is: (estimate) (table value)( SE of estimate) MARGIN OF ERROR General Form for a Confidence Interval for the Mean For the population mean we have, X (t table value)SE ( X ) or X t s n The appropriate columns in t-distribution table for the different confidence intervals are as follows: 90% Confidence look in the .05 column (if n is “large” we can use 1.645) 95% Confidence look in the .025 column (if n is “large” we can use 1.960) 99% Confidence look in the .005 column (if n is “large” we can use 2.576) Example: Suppose we are trying to estimate the birth weight of infants born to women who smoke during pregnancy. A sample of n = 73 women from Baltimore who smoked during pregnancy and the birth weight of their baby was obtained yielding a sample mean of X 6.08 lbs. with a sample standard deviation of s = 1.45 lbs. a) Use this information to find a 95% CI for the mean birth weight of infants born to mothers who smoked during pregnancy found, assuming that birth weights for this population are normally distributed. b) Suppose a sample of n = 113 Baltimore mothers who did not smoke during pregnancy was obtained and a sample mean birth weight of X 6.71 lbs. with a standard deviation of s = 1.66 lbs was obtained. Find a 95% confidence interval for the mean birth weight of infants born to nonsmoking mothers. 50 c) Does this interval in conjunction with the interval obtained for mothers who smoked during pregnancy provide evidence that infants born to smoking mothers have a lower mean birth weight? 51 5.1.2 – Sampling Distribution of p̂ and Applications Approximate Sampling Distribution of the Sample Proportion ( p̂ ) As with the sample mean ( X ) the sample proportion ( p̂ ) is also random, as it too varies from sample to sample. The sampling distribution of p̂ has the following properties: 1. The mean of the sampling distribution is the population proportion (p) 2. The standard deviation of the sampling distribution or the standard error of p̂ and is given by: SE ( pˆ ) p population proportion (unknown) p(1 p) where n sample size n 3. The sampling distribution is approx. normal provided n is “sufficiently large”. np 5 n(1 p ) 5 * Note : some recommend using 10 in place of 5. Note: When estimating proportions large sample sizes are general ly used (e.g. n > 100) Exact Sampling Distribution of the Sample Proportion ( p̂ ) The sample proportion comes from a binomial probability experiment. A binomial probability experiment satisfies the following conditions 1. There are a fixed number of n trials carried out. 2. The outcome of a given trial is either a “success” or “failure”. 3. The probability of success (p) remains constant from trial to trial. P( success ) p and P( failure ) 1 p q 4. The trials are independent, i.e. the outcome of a trial is not affected by the outcome of any other trial. A binomial random variable X is defined to the number of “successes” in n independent trials where the P(success) = p is constant. The sample proportion is defined in terms of X the number of successes observed in our random sample of size n, pˆ . n Binomial Probability Function n n! P( X x) p x q n x p x q n x , x 0,1,..., n x!(n x)! x the coefficient in front denotes the number of ways to obtain x successes in n trials. 52 The binomial distribution function gives P( X x) . We can use the binomial distribution in making inferences about the true population proportion (or binomial probability of “success”) (p). For example suppose we trying to determine if a coin is fair, i.e. p = P(Head) = .50. What will have to happen in order for you to believe that coin is biased in favor of landing heads, i.e. p = P(Head) > .50? USING THE SAMPLING DISTRIBUTION TO MAKE INFERENCES ABOUT THE POPULATION PROPORTION (both the approximate and exact approaches) Example: New Method for Treating a Certain Illness/Disease Suppose the current treatment method for certain disease has 70% success rate. A new method has been proposed that will hopefully have a higher success rate. The new method is administered to a sample n = 50 patient and 40 have successful treatment. Can we conclude on the basis of this result that the new method has a higher success rate? Using the Normal Approximation to the Sampling Distribution 53 Using Normal Probability Calculator in JMP Using the Exact Binomial Sampling Distribution Here n = 50, the observed number of successes X = 40 and if the new method is not better then the hypothesized proportion (i.e. assumed true initially) is p = .70. 54 CONFIDENCE INTERVALS FOR THE POPULATION PROPORTION (p) Motivating Example: In a study conducted to investigate the non-clinical factors associated with the method of surgical treatment received for early-stage breast cancer, some patients underwent a modified radical mastectomy while others had a partial mastectomy accompanied by radiation therapy. We are interested in determining whether the age of the patient affects the type of treatment she receives. In particular, we want to know whether the proportions of women under 55 are identical in the two treatment groups. A sample of n = 658 women who underwent a partial mastectomy and subsequent radiation therapy contains 292 women under 55, which is a sample percentage of 44%. A better estimate might be 44% give or take 4%, i.e. estimating that the actual percentage of women who receive this form of treatment under the age of 55 is between 39% and 48%. This is called an “interval estimate”, as it gives a range or interval of plausible values for the population proportion/percentage. As with the population mean discussed earlier, we wish this interval to be narrow enough to provide useful information about this unknown percentage, yet have a high probability or chance of covering the actual percentage of women under 55 amongst those opting for this course of treatment for early-stage breast cancer. The central limit theorem for proportions states that if our sample size (n) is sufficiently p(1 p) large, then pˆ ~ N ( p, ) . This means that when we take our sample and find our n sample proportion, p̂ , the probability our observed sample proportion will fall within approximately two standard errors of the population proportion is roughly 95%, or more precisely P( p 1.96 p(1 p) p(1 p) pˆ p 1.96 ) .9500 Recall: P 1.96 Z 1.96 .9500 n n Starting with this statement we can perform some algebraic manipulations to isolate the population proportion, p,in the middle of the inequality above. By doing this we will see that the resulting interval will have a 95% chance of covering the true population proportion (p). After a Wonderfully Simple Mathematical Derivation: p(1 p) p(1 p) up to pˆ 1.96 has a 95% n n chance of covering the true population proportion p. This interval is simply the sample proportion plus or minus roughly two standard errors, i.e. pˆ 1.96 SE ( pˆ ) . However, this interval cannot be calculated in practice! WHY? This says that the interval from pˆ 1.96 55 A simple fix is to replace ______ by our sample based estimate ________. Provided the sample size is sufficient large the resulting interval will still have an approximate 95% chance of covering the true population proportion. This gives what we should technically call the estimated standard error of the proportion, but when we say “standard error of the proportion” it is assumed this estimated version is the one we are talking about because in reality the population proportion p is NOT known. If p were known we would not be conducting a study in first place! General Form for a C for Population Proportion (p) estimate (table value) (estimated standard error of estimate) pˆ (normal table value) Margin of Error z pˆ (1 pˆ ) n or pˆ z pˆ (1 pˆ ) n pˆ (1 pˆ ) n Normal Table Values: 95% Confidence we use z = 1.96 90% Confidence we use z = 1.645 99% Confidence we use z = 2.576 Again we see the confidence interval has the basic form: ESTIMATE (TABLE VALUE) (STANDARD ERROR OF THE ESTIMATE) MARGIN OF ERROR In other words we take our estimate plus or minus a certain number of standard errors to obtain the confidence interval, i.e. plus or minus the margin of error. Example: Early-Stage Breast Cancer Treatment Method and Age (cont’d) In a sample of n = 658 women who underwent a partial mastectomy and subsequent radiation therapy contains 292 women under 55, which is a sample percentage of 44.4%. Find a 95% CI for the true proportion of women under 55 in this population. 56 In a sample of n = 1580 women who received a modified radical mastectomy 397 women were under 55, which is a sample percentage of 25.1%. Find a 95% CI for the true proportion of women under 55 in this population. Do these intervals suggest that the proportion of women under the age of 55 differs significantly for these two courses of treatment of early-stage breast cancer? 57 One-Sided Confidence Intervals One-Sided CI’s for the Population Mean ( Lower Bound for s X t n Upper Bound for s X t n Where t comes from the t-distribution with df = n – 1. The appropriate columns in Table A.4 for the different confidence intervals are as follows: 90% Confidence look in the .10 column 95% Confidence look in the .05 column 99% Confidence look in the .01 column One-Sided CI’s for the Population Proportion (p) Lower Bound for p pˆ (1 pˆ ) pˆ z n Upper Bound for p pˆ (1 pˆ ) pˆ z n Where z comes from the standard normal distribution. The appropriate values for the different confidence intervals are as follows: 90% Confidence use z = 1.280 95% Confidence use z = 1.645 99% Confidence use z = 2.330 58 5.2 – Review of the Basic Steps in a Hypothesis Test Several of the examples in the different sections of 5.1 above are really examples of a hypothesis test. In this section we review the formal process of carrying out a hypothesis test. As the course progresses we relax the formality of the process but keep in mind the steps list below are always there even if they are written out or discussed explicitly. Before we look at hypothesis testing for a single population mean ( ) we will examine the five basic steps in a hypothesis test and introduce some important terminology and concepts. Steps in a Hypothesis Test 1. 2. 3. 59 4. 5. * 6. 60 5.3 - Hypothesis Test for a Single Population Mean ( ) Null Hypothesis ( H o ) Alternative Hypothesis ( H a ) p-value area o o o o o o Upper-tail Lower-tail Two-tailed (perform test using CI for ) Test Statistic (in general) In general the basic form of most test statistics is given by: (estimate) (hypothesized value) Test Statistic = (think “z-score”) SE (estimate) which measures the discrepancy between the estimate from our sample and the hypothesized value under the null hypothesis. Intuitively, if our sample-based estimate is “far away” from the hypothesized value assuming the null hypothesis is true, we will reject the null hypothesis in favor of the alternative or research hypothesis. Extreme test statistic values occur when our estimate is a large number of standard errors away from the hypothesized value under the null. The p-value is the probability, that by chance variation alone, we would get a test statistic as extreme or more extreme than the one observed assuming the null hypothesis is true. If this probability is “small” then we have evidence against the null hypothesis, in other words we have evidence to support our research hypothesis. Truth Type I and Type II Errors ( & ) Decision H o true H a true Reject H o Fail to Reject H o 61 Example: Testing Wells for a Perchlorate in Morgan Hill & Gilroy, CA. EPA guidelines suggest that drinking water should not have a perchlorate level exceeding 4 ppb (parts per billion). Perchlorate contamination in California water (ground, surface, and well) is becoming a widespread problem. The Olin Corp., a manufacturer of road flares in the Morgan Hill area from 1955 to 1996 was is the source of the perchlorate contamination in the this area. Suppose you are resident of the Morgan Hill area which alternative do you want well testers to use and why? H o : 4 ppb H a : 4 ppb or H o : 4 ppb H a : 4 ppb Test Statistic for Testing a Single Population Mean ( ) ~ (t-test) t X o X o ~ t-distribution with df = n – 1. or t s SE ( X ) n Assumptions: When making inferences about a single population mean we assume the following: 1. The sample constitutes a random sample from the population of interest. 2. The population distribution is normal. This assumption can be relaxed when our sample size in sufficiently “large”. How large the sample size needs to be is dependent upon how “non-normal” the population distribution is. Example 1: Length of Stay in a Nursing Home In the past the average number of nursing home days required by elderly patients before they could be released to home care was 17 days. It is hoped that a new program will reduce this figure. Do these data support the research hypothesis? 3 5 12 7 22 6 2 18 9 8 20 15 3 36 38 43 62 Normality does not appear to be satisfied here! Notice the CI for the mean length of stay is (8.38 days, 22.49 days). Hypothesis Test: Ho : 1) HA : 2) Choose Test statistic 3) Compute test statistic 4) Find p-value (use t-Probability Calculator.JMP) 63 5) Make decision and interpret To perform a t-test in JMP, select Test Mean from the LOS pull-down menu and enter value for mean under the null hypothesis,17.0 in this example. Conclusion: In JMP The hypothesized mean is the value for the population mean under the null hypothesis. If normality is questionable or the same size is small a nonparametric test may be more appropriate. We will discuss nonparametric tests later in the course. The graphic on the left is obtained by selecting P-value animation from the pulldown menu next to Test Mean=value. * click Low Side for a lower-tail test, similarly for the other two types of alternatives. 64 5.4 - Determining the Sample Size Necessary for a Desired Margin of Error When Estimating the Population Mean Population Mean () In the discussion above we found that the interval X 1.96 up to X 1.96 n n had a 95% chance of covering the population mean. The margin of error for this interval is Margin of Error ( E ) 1.96 n If we wanted this to be at most E units what sample size should we use? This says that to obtain a 95% CI for with a margin of error no larger than E we should use a sample size of 1.96 n E 2 However we cannot calculate this in practice unless we know ? Which of course we don’t and furthermore we don’t even know s, the sample standard deviation, until we have our data in hand. Thus in order to use this result we need to plug in a “best guess” for . This guess might come from: Pilot study where s = sample standard deviation is calculated Prior studies Use approximation based on the Range, Range . Granted we don’t 4 the range until the data is collected, but we might be able to guess the largest and smallest values we might expect to see when collect our data. In general, using a which is too large is better than using one that is too small. Example: What sample size would be necessary to estimate the mean age of DUI offenders in MN with a 95% confidence interval that has a margin of error no larger than 2 years? 65 5.5 - Power and Sample Size for Testing the Population Mean In designing a study, we often times have prior knowledge about how large of difference or effect we want to be able to detect as significant. We can use this knowledge to help us determine the sample size to use in conducting our study. Without going through the derivation, the formula we use is for determining n is 2 ( z z )( ) n round this up to the next integer value. ( 1 o ) where, z standard normal value corresponding to , Type I error probability. For one-tailed hypotheses these values are: z.01 2.33, z.05 1.645, z.10 1.28 For two-sided alternatives these values are: z.01 2.576, z.05 1.96, z.10 1.645 z = standard normal value corresponding to , Type II error probability. z.01 2.33, z.05 1.645, z.10 1.28 , etc. basically the one-tailed values above. conservative “guess” for the true population standard deviation ( Range/4) 1 population mean assuming alternative hypothesis is true o population mean assuming null hypothesis is true. (1 o ) difference we wish to be able to detect as significant with a probability of ( 1 ) using a significance level test. Power = P(Reject Ho|Ho is False) = 1 P(Reject Ho| Ho is True) Example: Suppose in the nursing home stay study wanted to have a 95% chance of detecting a mean of 14 days as being significantly less than 17 days using a significance test with .05. What sample size would be required is she believes the range of length of stay will be between 2 days and 50 days? Use Power calculator in JMP: DOE > Sample Size and Power 66 If you leave to of the fields empty amongst the Power, Sample Size and Difference to Detect you will get a plot of the two left empty versus one another for the case where the specified field is as chosen. For example we could specify in the previous example we could just specify the power as .95 and then obtain a plot of sample size (n) vs. difference to detect (). From this we can see that the approximate sample size needed to detect a reduction of 5 days is around n = 70 to 75. Below we have specified a difference to detect = 3 days and left the sample size and power fields empty which gives us a plot of power (1 - ) vs. sample size (n). For a sample size of n = 125 we find a power of around 80%. 67 5.6 - Statistical Inference for a Population Proportion (p) We have already discussed the confidence interval as a means of make a decision about the value of the population proportion, p. The CI results are summarized below. General Form for a C for Population Proportion (p) estimate (table value) (estimated standard error of estimate) pˆ (normal table value) Margin of Error z pˆ (1 pˆ ) n or pˆ z pˆ (1 pˆ ) n pˆ (1 pˆ ) n Normal Table Values: Confidence Level 95 % ( .05) 90 % ( .10 ) 99 % ( .01 ) z 1.96 1.645 2.576 Hypothesis Tests for p H o : p po H a : p po or p po or p po (use CI for two - sided which is rarely of interest for p anyway) Test Statistic pˆ p o z ~ standard normal N(0,1) provided npo 5 and n(1 po ) 5 p o (1 p o ) n When our sample size is small or we want an exact test we can use the binomial distribution to calculate the p-value as follows: Reject Ho in favor of Ha: p > po if P(X > x | n,po) < Reject Ho in favor of Ha: p < po if P(X < x | n,po)< Reject Ho in favor of Ha: p po if either P(X > x | n,po) < or P(X < x | n,po) < (This is called the Binomial Exact Test for p) Example: Hypertension During Finals Week In the college-age population in this country (18 – 24 yr. olds), about 9.2% have hypertension (systolic BP > 140 mmHg and/or diastolic BP > 90 mmHg). Suppose a sample of n = 196 WSU students is taken during finals week and 29 have hypertension. Do these data provide evidence that the percentage of students who are hypertensive during finals week is higher than 9.2%? 68 Hypothesis Test: Ho : 1) Ha : 2) Choose Test statistic 3) Compute test statistic 4) Find p-value (use Normal Probability Calculator.JMP) Binomial Exact Test Use n = 196 and p = .092 (hypothesized value under Ho) Exact p-value = 5) Make decision and interpret 6) Confidence Interval for p 69 5.7 – Sample Size and Power for Inference about the Population Proportion (p) CI for the Population Proportion (p) In the discussion above we found that the interval p(1 p) p(1 p) up to pˆ 1.96 pˆ 1.96 n n had a 95% chance of covering the population proportion. The margin of error for this interval is p(1 p) Margin of Error 1.96 n If we wanted this to be at most E units what sample size should we use? This says that to obtain a 95% CI for p with a margin of error no larger than E we should use a sample size of 1.962 p(1 p) n 2 E However we cannot calculate this in practice unless we know p? Which of course we don’t and furthermore we don’t even know p̂ , the sample proportion, until we have our data in hand. In order to use this result we need to plug in a “best guess” for p. This guess might come from: Pilot study where p̂ = sample proportion is calculated Prior studies Use the worst case scenario by noting that p(1 p) .25 and is equal to .25 when p=.50. Using p = .50 simplifies the formula to 1.96 2 n 4E 2 If you have no “best guess” for p this conservative approach is the one you should take. Example: How many patients would need to be used to estimate the success rate of medical procedure, if researchers initially believe the success rate is no smaller than 85% and wish to estimate the true success rate using a 95% confidence interval with a margin of error no larger than E = .03? 70 What if they wish to assume nothing about the success rate initially? Power Considerations In designing a study, we often times have prior knowledge about how large of difference or effect we want to be able to detect as significant. We can use this knowledge to help us determine the sample size to use in conducting our study. Suppose we wish to conduct a test at the level, having a power = 1 - , of detecting a difference between the true proportion (p1) and the hypothesized proportion (po) of po- p|, then the sample size necessary to achieve that goal is given by z n po (1 po ) z po p1 p1 (1 p1 ) 2 z standard normal value corresponding to , Type I error probability. For one-tailed hypotheses these values are: z.01 2.33, z.05 1.645, z.10 1.28 For two-sided alternatives these values are: z.01 2.576, z.05 1.96, z.10 1.645 z = standard normal value corresponding to , Type II error probability. z.01 2.33, z.05 1.645, z.10 1.28 , etc. basically the one-tailed values above. Example: Suppose we view an increase from 9.2% to 12.0% to be a meaningful increase in the percentage of college students exhibiting hypertension. Suppose we wish to have an 80% chance of detecting such an increase as statistically significant at the level, what sample size do we need? n = 659 71 5.8 - Comparing Two Population Means Using Dependent or Paired Samples When using dependent samples each observation from population 1 has a one-to-one correspondence with an observation from population 2. One of the most common cases where this arises is when we measure the response on the same subjects before and after treatment. This is commonly called a “pre-test/post-test” situation. However, sometimes we have pairs of subjects in the two populations meaningfully matched on some prespecified criteria. For example, we might match individuals who are the same race, gender, socio-economic status, height, weight, etc... to control for the influence these characteristics might have on the response of interest. When this is done we say that we are “controlling for the effects of race, gender, etc...”. By using matched-pairs of subjects we are in effect removing the effect of potential confounding factors, thus giving us a clearer picture of the difference between the two populations being studied. DATA FORMAT Matched Pair X 1i 1 2 3 ... n X 2i X 11 X 21 X 12 X 22 X 13 X 23 ... ... X 1n X 2 n d i X 1i X 2i d1 d2 d3 ... dn For the sample paired differences ( d i ' s ) find the sample mean (d ) and standard deviation ( s d ) . The general hypotheses are H o : d o H a : d o or H a : d o or H a : d o d mean for the population of paired differences Note: While 0 is usually used as the hypothesized mean difference under the null, we actually can hypothesize any size difference for the mean of the paired differences that we want. For example if wanted to show a certain diet resulted in at least a 10 lb. decrease in weight then we could test if the paired differences: d = Initial weight – After diet weight had mean greater than 10 ( H a : d 10 lbs. ) Test Statistic for a Paired t-Test (estimate of mean paired difference) - (hypothesized mean difference) t SE(estimate) d o ~ t - distributi on with df n - 1 sd n where o the hypothesized value for the mean paired difference under the null hypothesis. 100(1- )% CI for d s where t comes from the appropriate quantile of t-distribution df = n – 1. d t d n This interval has a 100(1- )% chance of covering the true mean paired difference. 72 Example: Effect of Captopril on Blood Pressure In order to estimate the effect of the drug Captopril on blood pressure (both systolic and diastolic) the drug is administered to a random sample n = 15 subjects. Each subjects blood pressure was recorded before taking the drug and then 30 minutes after taking the drug. The data are shown below. Syspre – initial systolic blood pressure Syspost – systolic blood pressure 30 minutes after taking the drug Diapre – initial diastolic blood pressure Diapost – diastolic blood pressure 30 minutes after taking the drug Research Questions: Is there evidence to suggest that Captopril results in a systolic blood pressure decrease of at least 10 mmHg on average in patients 30 minutes after taking it? Is there evidence to suggest that Captopril results in a diastolic blood pressure decrease of at least 5 mmHg on average in patients 30 minutes after taking it? For each blood pressure we need to consider paired differences of the form d i BPpre i BPpost i . For paired differences defined this way, positive values correspond to a reduction in their blood pressure ½ hour after taking Captopril. To answer research questions above we need to conduct the following hypothesis tests: H o : syspre syspost 10 mmHg and H o : diaprediapost 5 mmHg H a : syspre syspost 10 mmHg H a : diaprediapost 5 mmHg Below are the relevant statistical summaries of the paired differences for both blood pressure measurements. The t-statistics for both tests are given below: Systolic BP Diastolic BP 73 We can use the t-Probability Calculator in JMP to find the associated p-values or better yet use JMP to conduct the entire t-test. Systolic Blood Pressure Diastolic Blood Pressure Both tests result in rejection of the null hypotheses. This we have sufficient evidence to suggest that taking Captopril will result in mean decrease in systolic blood pressure exceeding 10 mmHg (p = _______) and a mean decrease in diastolic blood pressure exceeding 5 mmHg (p = _______). Furthermore we estimate that the mean change in systolic blood pressure will be somewhere between _______ mmHg and ______ mmHg, and that the mean change in diastolic blood pressure could be as large as ______ mmHg. 74 5.9 – Comparing Two Pop. Means Using Independent Samples Example 1: Prior Knowledge of Instructor and Lecture Rating How powerful are rumors? Frequently, students ask friends and/or look at instructor evaluations to decide if a class is worth taking. Kelley (1950) found that instructor reputation has a profound impact on actual teaching ratings. Towler and Dipboye (1998) replicated and extended this study by asking: “Does an instructor's prior reputation affect student ratings?” Towler, A., & Dipboye, R. L. (1998). “The effect of instructor reputation and need for cognition on student behavior” Experimental Design: Subjects were randomly assigned to one of two conditions. Before viewing the lecture, students were give a summary of the instructors prior teaching evaluations. There were two conditions: Charismatic instructor and Punitive instructor. Summary given in the "Charismatic instructor" condition: Frequently at or near the top of the academic department in all teaching categories. Professor S was always lively and stimulating in class, and commanded respect from everyone. In class, she always encouraged students to express their ideas and opinions, however foolish or half-baked. Professor S was always innovative. She used differing teaching methods and frequently allowed students to experiment and be creative. Outside the classroom, Professor S was always approachable and treated students as individuals. Summary given in the "Punitive instructor" condition: Frequently near the bottom of the academic department in all important teaching categories. Professor S did not show an interest in students' progress or make any attempt to sustain student interest in the subject. When students asked questions in class, they were frequently told to find the answers for themselves. When students felt they had produced a good piece of work, very rarely were they given positive feedback. In fact, Professor S consistently seemed to grade students harder than other lecturers in the department. Then all subjects watched the same twenty-minute lecture given by the exact same lecturer. Following the lecture, subjects rated the lecturer. Subjects answered three questions about the leadership qualities of the lecturer. A summary rating score was computed and used as the variable "rating" here. 75 Research Question: Does an instructor prior reputation affect student ratings of a lecture given by a professor? Summary Statistics xC 2.613 x P 2.236 s C .533 s P ..543 nC 25 n P 24 Intuitive Decision In order to determine whether or not the null or alternative hypothesis is true, you could review the summary statistics for the variable you are interested in testing across the two groups. Remember, these summary statistics and/or graphs are for the observations you sampled, and to make decisions about all observations of interest, we must apply some inferential technique (i.e. hypothesis tests or confidence intervals) One of the best graphical displays for this situation is the side-by-side boxplots. To get side-by-side boxplots, select Analyze > Fit Y by X. Place Prior Info in the X box and Rating in the Y box. Place mean diamonds & histograms on the plot, and we may also want to jitter the points. The more separation there is in the mean diamonds, the more likely we are to reject the null hypothesis (i.e data tends to support the alternative hypothesis). To answer the question of interest formally we need inferential tools for comparing the mean rating given to a lecture when students are told the professor is a charismatic individual vs. mean rating given when students are given the punitive instructor prior opinion, i.e. compare charismatic to punitive. 76 Hypothesis Testing ( 1 vs. 2 ) The general null hypothesis says that the two population means are equal, or equivalently there difference is zero. The alternative or research hypothesis can be any one of the three usual choices (upper-tail, lower-tail, or two-tailed). For the two-tailed case we can perform the test by using a confidence interval for the difference in the population means and determining whether 0 is contained in the confidence interval. H o : 1 2 or equivalently ( 1 2 ) hypothesized difference (typically 0) H a: 1 2 or equivalently ( 1 2 ) hypothesized difference (upper - tail) or H a : 1 2 or equivalently ( 1 2 ) hypothesized difference (two - tailed, USE CI! ) etc.... Test Statistic t ( X 1 X 2 ) (hypothesized difference) ~ t-distribu tion with appropriat e degrees of freedom SE ( X 1 X 2 ) where the SE ( X 1 X 2 ) and degrees of freedom for the t-distribution comes from one of the two cases described below. Confidence Interval for the Difference in the Population Means 100(1 - )% Confidence Interval for ( 1 2 ) ( X 1 X 2 ) t SE ( X 1 X 2 ) where t comes from t-table with appropriate degrees of freedom (see two cases below). There are two cases one needs to consider when comparing two population means using independent samples. Case 1 ~ Equal Populations Variances/Standard Deviations 2 2 ( 1 2 = 2 common variance to both populations) Rule of Thumb for Checking Variance Equality If the larger sample variance is more than twice the smaller sample variance do not assume the variances are equal. Assumptions: For this case we make the following assumptions 1. The samples from the two populations were drawn independently. 2. The population variances/standard deviations are equal. 3. The populations are both normally distributed. This assumption can be relaxed when the samples from both populations are “large”. 77 Case 1 – Equal Variances (cont’d) Assuming the assumptions listed above are all satisfied we have the following for the standard error of the difference in the sample means. 1 2 1 SE ( X 1 X 2 ) s p n1 n 2 where (n 1) s1 (n2 1) s 2 1 n1 n 2 2 2 sp 2 2 if n1 n 2 s 2p s12 s 22 if n1 n2 2 s p is called the “pooled estimate of the common variance ( 2 ) ”. The degrees of 2 freedom for the t-distribution in this case is df n1 n2 2 . Example 1: Prior Knowledge of Instructor and Lecture Rating (cont’d) Case 1 – Equal Variances To perform the “pooled t-Test” select the Means/Anova/Pooled t option from the Oneway Analysis pull-down menu. Case 2 – Unequal Variances If you do not want the to assume the population variances are equal then select the t Test option. To formally test whether we can assume the population variances are equal select UnEqual Variances from pull-down menu. 78 t-Test Results from JMP Discussion: In the previous example we chose to use a pooled t-test assuming the population variances were equal based upon the visual evidence and applying the “rule of thumb”. To formally test this assumption, choose the UnEqual Variances option from the Oneway Analysis pull-down menu. The results are shown below. Interpretation of Results 79 Example 2: Normal Human Body Temperatures Females vs. Males Do men and women have the same normal body temperature? Putting this into a statement involving parameters that can be tested: H o : F M or ( F M ) 0 H a : F M or ( F M ) 0 F mean body temperature for females. M mean body temperature for males. Assumptions 1. The two groups must be independent of each other. 2. The observation from each group should be normally distributed. 3. Decide whether or not we wish to assume the population variances are equal. Checking Assumptions Assessing Normality of the Two Sampled Populations (Assumption 2) To assess normality we select Normal Quantile Plot from the Oneway Analysis pulldown menu as shown below. Normality appears to be satisfied here. 80 Checking the Equality of the Population Variances To test the equality of the population variances select Unequal Variances from the Oneway Analysis pull-down menu. The test is: Ho : F M Ha : F M JMP gives four different tests for examining the equality of population variances. To use the results of these tests simply examine the resulting p-values. If any/all are less than .10 or .05 then worry about the assumption of equal variances and use the unequal variance tTest instead of the pooled t-Test. p-values for testing variances 81 Example 2: Normal Human Body Temperatures Females vs. Males (cont’d) To perform the two-sample t-Test for independent samples: assuming equal population variances select the Means/Anova/Pooled t option from Oneway-Analysis pull-down menu. assuming unequal population variances select t-Test from the Oneway-Analysis pull-down menu. Because we have no evidence against the equality of the population variances assumption we will use a pooled t-Test to compare the population means. Several new boxes of output will appear below the graph once the appropriate option has been selected, some of which we will not concern ourselves with. The relevant box for us will be labeled t-Test is shown below for the mean body temperature comparison. Because we have concluded that the equality of variance assumption is reasonable for these data we can refer to the output for the t-Test assuming equal variances. What is the test statistic value for this test? What is the p-value? What is your decision for the test? Write a conclusion for your findings. 82 Interpretation of the CI for ( F M ) Case 2 - Unequal Populations Variances/Standard Deviations ( 1 2 ) Assumptions: For this case we make the following assumptions 1. The samples from the two populations were drawn independently. 2. The population variances/standard deviations are NOT equal. (This can be formally tested or use rule o’thumb) 3. The populations are both normally distributed. This assumption can be relaxed when the samples from both populations are “large”. Test Statistic t (X1 X 2 ) 0 ~ t-distribution with df = (see formula below) SE ( X 1 X 2 ) where the SE ( X 1 X 2 ) is as defined below. 100(1 - )% Confidence Interval for ( 1 2 ) ( X 1 X 2 ) t SE ( X 1 X 2 ) where 2 SE ( X 1 X 2 ) 2 s1 s 2 n1 n2 and df s1 2 s 2 2 n n 2 1 2 rounded down to the nearest integer 2 2 s1 2 s2 2 n 1 n2 n1 1 n2 1 The t-quantiles are the same as those we have seen previously. 83 Example: Cell Radii of Malignant vs. Benign Breast Tumors These data come from a study of breast tumors conducted at the University of WisconsinMadison. The goal was determine if malignancy of a tumor could be established by using shape characteristics of cells obtained via fine needle aspiration (FNA) and digitized scanning of the cells. The sample of tumor cells were examined under an electron microscope and a variety of cell shape characteristics were measured. One of the goals of the study was to determine which cell characteristics are most useful for discriminating between benign and malignant tumors. The variables in the data file are: ID - patient identification number (not used) Diagnosis determined by biopsy - B = benign or M = malignant Radius = radius (mean of distances from center to points on the perimeter Texture texture (standard deviation of gray-scale values) Smoothness = smoothness (local variation in radius lengths) Compactness = compactness (perimeter^2 / area - 1.0) Concavity = concavity (severity of concave portions of the contour) Concavepts = concave points (number of concave portions of the contour) Symmetry = symmetry (measure of symmetry of the cell nucleus) FracDim = fractal dimension ("coastline approximation" - 1) Medical literature citations: W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 163-171. W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Analytical and Quantitative Cytology and Histology, Vol. 17 No. 2, pages 77-87, April 1995. W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computerized breast cancer diagnosis and prognosis from fine needle aspirates. Archives of Surgery 1995;130:511-516. W.H. Wolberg, W.N. Street, D.M. Heisey, and O.L. Mangasarian. Computer-derived nuclear features distinguish malignant from benign breast cytology. Human Pathology, 26:792--796, 1995. See also: http://www.cs.wisc.edu/~olvi/uwmp/mpml.html http://www.cs.wisc.edu/~olvi/uwmp/cancer.html In this example we focus on the potential differences in the cell radius between benign and malignant tumor cells. 84 The cell radii of the malignant tumors certainly appear to be larger than the cell radii of the benign tumors. The summary statistics support this with sample means/medians of rough 17 and 12 units respectively. The 95% CI’s for the mean cell radius for the two tumor groups do not overlap, which further supports a significant difference in the cell radii exists. Testing the Equality of Population Variances 85 Because we conclude that the population variances are unequal we should use the nonpooled version to the two-sample t-test. No one does this by hand, so we will use JMP. Conclusion: 86 5.9 – Effect Size (d), Variance Explained, and Polyserial Correlation 87