* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sampling Distribution
Survey
Document related concepts
Transcript
Pharmaceutical Statistics Lecture 8 Distributions of the Difference Between Two Sample Means Δ Distributions of the Difference Between Two Sample Means Comparison between two population is widely used and thus we need to learn how to compare two sample means from two different population Distributions of the Difference Between Two Sample Means Sample Mean Distribution for Population 1 X Sample Mean Distribution for Population 2 X1 X2 X 1 X 2 [X2 1 2 ] 1 X 2 X2 [(2 /n ) (2 /n )] 1 Distribution of the difference between two sample means (population 1&2) 1 2 2 SE() X 1 X 2 X 1 X 2 If we know the mean and the standard deviation of the new probability distribution, we can use the SND approach to find out the probability that the differences in the means will be less/more/between any specific values (as we did before!!) CASE OF B-A CASE OF A-B XA XA z XA X B X B A B [( 2 / n) ( 2 / n)] XA X B XB A A B B XB XB (X A X B ) X X A X A X B B z XB X A X A B A XB XA [( 2 / n) ( 2 / n)] X A B B A A (X B X A ) X X B X B A X A Note: this approach is valid for normally distributed populations even though they have different known variances and/or different sample sizes Note: For non-normally distributed populations, we need to draw large sample and thus to apply CLT to assume population normality and finally to use this approach Example I (Textbook, Chp 5, P143) In two population: Population I has experienced some condition that associated with mental retardation. The second population has not experienced these conditions. The distribution of intelligence scores in each of the two populations (1 and 2) is believed to be normally distributed and equal for both with standard deviation of 20. A sample of 15 individual from each population were withdrawn, compute the probability of the difference between two means to be equal or larger than 13? X 1 X 2 X X 1 2 First .case : x1 x2 13 X 1 X2 1 0 2 [(12 /n1) (22 / n2 )] SE() [(202 /15) (202 /15)] 7.3 z13 (x1 x2 ) X X 1 X 13 1.78 7.3 2 1 X 2 Second.case : x2 x1 13.... x1 x2 13 z 1 3 (x1 x2 ) X X 1 X 1 X 2 2 13 1.78 7.3 Distribution of the difference between two sample means (population 1&2) X 1 X 2 7.3 X X 0 1 2 SND 1 AUC=0.0375 AUC=0.0375 Z=-1.78 z 0 Z=+1.78 From the textbook. P.143: “The probability of obtaining a difference between sample means as or larger than 13 is 0.0375”….. Do you agree?? I do not: The probability should be 0.0375*2=0.075 since area to the right of z=+1.78 also satisfy the condition that the difference is larger than 13!! The textbook took only case II in the previous slide! Example II Population A: μA =45 min, σA=15 min Population B: μB =43 min, σB=20 min If we select 35 variables from pop A (sample A) and 40 variables from pop B (sample B), what is the probability that the means for samples A&B will differ by 5 minutes or more? A 45,A 15,nA 35 B 43,B 20, nB 40 A B 2 X X A X A A 45,A 15,nA 35 B 43,B 20,nB 40 A B 2 X A X B B X B [(A2 / nA) ( B2 / nB )] 4.0 [(A2 /n A ) (2B /n B )] 4.0 X A X B firstcase : sec ondcase: X A X B 5 X B X A 5,then : X A X B 5 5 2 0.75 4.0 P(z 0.75) 1 P(z 0.75) z5 z5 5 2 1.75 4.0 P(z 1.75) 0.04 0.23 Overall probability=0.23+0.04=0.27 Distribution of the difference between two sample means (populationA&B) -5 X A X B 2 2 X A X B 4 XA XB 5 SND 1 AUC=0.23 AUC=0.04 -1.75 0 +0.75 z X A X B Overall probability=AUCright+AUCleft=0.23+0.04=0.27 Example III Population A: μA =45 min, σA=15 min Population B: μB =30 min, σB=20 min If we select 35 variables from pop A (sample A) and 40 variables from pop B (sample B), what is the probability that the of means for samples A&B will differ by 20 minutes or more? A 45,A 15,nA 35 B 30,B 20,nB 40 A B 15 X X A X B A X B [(A2 / nA) ( B2 / nB)] 4.05 A 45, A 15,nA 35 B 30, B 20,nB 40 A B 15 X A X B X A X B [( A2 / nA) ( B2 / nB)] 4.05 firstcase : sec ondcase: X A X B 20 X B X A 20,then : X A X B 20 20 15 1.23 4.05 P(z 1.23) 0.11 z 20 20 15 8.75 4.05 P(z 8.75) zero z20 X Distribution of the difference between two sample means (populationA&B) -20 A X X B A X B 15 4 XA XB 15 20 SND The probability that the mean of sample B>the mean of sample A by 20 or more is really low!! 1 AUC=0.00 AUC=0.11 -8.75 0 +1.23 z X A X B Overall probability=AUCright+AUCleft=0.11+0.00=0.11 Example IV The capsule weight for two hard-gelatin capsule batches A&B are normally distributed with the following parameters: - Batch A: μ=250 mg; σ=25 mg - Batch B: μ=300 mg; σ=35 mg If we withdraw a random sample from A (30 capsules) and a random sample from B (40 capsules), what is the probability that the mean of sample B is larger than the mean of sample A by at least 55 mg? A 250, A 25,nA 30 B 300, B 35,nB 40 X X B A 50 B X B X A A [( 2 B / n) / n)] 7.17 B ( A 2 A X B X A 50 X B X A 55 55 50 0.65 z 55 7.71 AUC=0.26 Z=+0.65 Example IV (another way to solve it) The capsule weight for two hard-gelatin capsule batches A&B are normally distributed with the following parameters: - Batch A: μ=250 mg; σ=25 mg - Batch B: μ=300 mg; σ=35 mg If we draw sample from A (30 capsules) and sample from B(40 capsules), what is the probability that the mean of sample B is larger than the mean of sample A by at least 55 mg? A 250,A 25,nA 30 B 300,B 35,nB 40 X X A B 50 A XA B X B X [( 2 / n) ( 2 / n)] 7.17 A A B B A X B 50 X B X A 55 X A X B 55 55 (50) 0.65 z55 7.71 UC=0.26 Z=-0.65 Example IV (summary) CASE OF B-A CASE OF A-B A 250, A 25,nA 30 B 300, B 35,nB 40 X X A B 50 A X A B X B [( / nA ) ( /n B )] 7.17 2 A 2 B X B X A 55 A X B X 0.65 A X B A [( B2 / n B ) ( 2A / n A )] 7.17 X B X A zX z55 (X A X B ) X B X B X A 55 X A X B 55 zX A 250, A 25,nA 30 B 300, B 3 5,nB 40 X X B A 50 A XB 55 (50) 7.71 B X A z 55 (X B X A ) X 0.65 X B X A B X A 55 50 7.71 Pharmaceutical Statistics Lecture 9 Estimating a Single Population Mean: Point Estimate and Confidence Interval 15 Statistical Inferences “Estimation” From this lecture and on, we move from descriptive statistics to inferential statistics. In descriptive statistics, we simply summarize information available in the data we are given. In inferential statistics, we draw conclusions about a population based on a sample and a known or assumed sampling distribution. Implicit in statistical inference is the assumption that the data were gathered as a random sample from a population. Statistical Inferences Estimation Hypothesis Testing Prediction 16 The Process of Sample Infrence {ESTIMATION} 17 Estimation of a Population Mean: Point Estimate and Confidence Interval • If we wish to estimate the mean of some normally distributed population (μ), we would draw a random sample of size n from the population and compute Xwhich can be used as a point estimate of μ. • Although the sample mean is a good estimator of the population mean, we know that random sampling inherently involves chance and the sample mean can not be expected to be equal to the population mean. • It is more meaningful to estimate μ by an interval that communicates information regarding the probable magnitude of μ. 18 Estimation of a Population Mean: Point Estimate and Confidence Interval Sample taken from the parent population Sample mean: The point estimate Original variable distribution (X) Sample mean distribution _ x Sample Interval: The Confidence Interval 19 Remember Original variable distribution (X) P(_ 2 _ X _ 2_ ) 0.95 x Sample mean distribution 2_ 2_ x x x x x _ Constructing intervals about every possible value of computed mean from all possible samples of size n from the population of interest. The width of each interval is defined as 2 x X1 X2 X3 X4 x X5 X6 In this case, we can say that we are 95% confident that the pop mean will fall in randomly selected sample intervals wit2hthe widthof x X7 X8 x 2 x 20 First Case: Estimation of Population Mean if population variance is known n,x, 2 Sample mean + size + population variance/standard deviation: known x 2 x x Calculate mean sampling distribution standard deviation Use the C.I. formula to estimate the population mean Suppose a researcher, interested in obtaining an estimate of the average level of some enzyme in a certain human population, takes a sample of 10 individuals, determines the level of the enzyme in each and computes a sample mean of 22. Suppose further it is known that the variable of interest is approximately normally distributed with a variance of 45. Estimate the mean of the level of this enzyme (μ) in human population? x 22,2 45,n 10 x 2 x x 2 2 n 45 10 22 2(2.1213) 22 2 17.76 26.24 Sample mean=22, population variance=45, sample size=10 x±2σx σx=2.1213 22±4.24 C.I: (17.76-26.24) 21 General Formula x z(1 / 2) z(1 /2) x • X is the point estimate of μ. α/2 Cof. Level 1-α α/2 • The “z(1-α/2)” is the valueof z to the left of which lies 1-α/2 and to the right of which lies α/2 of the AUC. This value called the reliability coefficient • (1-α) is called the confidence level • x is the standard error or the standard deviation of the sampling distribution of the sample mean. C.I. : Estimator ∓ (Reliability Coefficient) x (Standard Error) 22 Common Levels of Confidence!! z(1/2 ) α/2 Cof. Level 1-α α/2 α Conf. Level 1-α 0.1 90% (0.9) 0.05 95% (0.95) 0.01 99% (0.99) z(1 / 2) z0.95 1.645 z0.975 1.96 z0.995 2.576 90%CIof X 1.645 n 95%CIof X 1.96 Any z can be used (from table) n 99%CIof X 2.576 n In repeated sampling, approximately 90, 95, 99% of the intervals constructed with the width (1.645SE, 1.96 SE, or 2.576SE respectively) will include the population mean OR we are 90, 95, 99% confident that the population mean is enclosed within the calculated confidence interval with the width (1.645SE, 1.96 SE, or 2.576SE respectively). 23 Example A physical therapist wished to estimate, with 99% confidence, the mean maximal strength of a particular muscle in a certain group of individuals. Assuming that strength scores are approximately normally distributed with a variance of 144. a sample of 15 subjects who participated in the study yielded a mean of 84.3. • • The z value corresponding to a confidence level of 0.99 is found to be 2.58 (reliability coefficient is ???). The 99% CI for μ is 99%CIof X 2.58 n 84.3 2.58(12/15) 84.3 8 99%CIof 76.3 92.3 24 Pharmaceutical Statistics Lecture 10 Estimating a Single Population Mean: The t-distribution The t Distribution • If the population standard deviation σ is known, we have learned last lecture how to estimate a population mean based on sample mean. • However, usually the population standard deviation is unknown as well as the population mean (what do we need to do in this case?). • We may use the sample standard deviation to replace σ. Here the zstatistics will convert to t-statistics : Note: -When we have small samples, it becomes necessary for us to use the tdistribution in constructing the confidence interval. - When the sample size is large (>30), our faith in s as an approximation of σ is usually substantial, and we may be justified in using standard normal distribution theory to construct a confidence interval for the population mean. Standard Normal Distribution x z / n z-distribution if σ is known Student’s t distribution t x s / n t-distribution if σ is unknown [pop variance is unknown+small n] The t-Distribution: Properties The variable t ranges from to It has a mean of 0 It is symmetrical about the mean In general it has a variance greater than 1, but the variance approaches 1 as the sample size becomes larger. 5. Compared to the normal distribution, the t distribution is less peaked in the center and has higher tails. 6. The t distribution approaches the normal distribution as n increases. 7. Degree of Freedom=n-1 1. 2. 3. 4. The t Distribution table • The t distribution, like the SND, has been extensively tabulated. • We must take both the cumulativ probability (AUC-∞tdf)and degrees of freedom into account to find t-scores (critical values) when using the table of the t distribution. D.F=n-1=11 Note: differentiate between cum. Probability and confidence level Z- versus t-table Z-table Only one parameter: z-score T-table Two parameters: cumm.prop and D.F We calculate z-score and then we find probability (AUC) We decide on the cumm.prop (from Conf. Level) and then we find the t-score (reliability coefficient) Always entries less than the When n approach ∞, t-table corresponding entries for t-tables entries match the z-table entries at same probability For the z-curve: μ=0; σ=1 For the t-curve: μ=0; σ>1 and approach 1 when n is large The t Distribution The general procedure for constructing confidence intervals is not affected by us having to use the t distribution rather than SND. estimator ∓ (reliability coefficient) x (standard error) X z(1 / 2) * n s X t(1 /2) * n or X z(1 / 2) * SE o r X t (1 /2) * SE Example 20 tablets were chosen randomly from a batch. Their weights in mg were 300 321 306 321 310 322 315 325 316 323 316 325 317 325 319 327 320 331 320 336. Estimate the batch mean weights with a confidence level of 95%? - Sample mean and standard deviation t-distribution: σ is unknown+small n x 319.75mg n s xi x i1 n 1 2 8.2 - df is 19 so t*=2.093 (t-table) 95%CIof 3 1 9 . 7 5 2.093 s n =315.91-323.59 mg Should I use z- or t-table? HW5 Q1: We wish to estimate the average number of heartbeats per minute for a certain population. The average number of heartbeats per minute for a sample of 49 subjects was found to be 90. Assume that these 49 patients constitute a random sample, and that the population is normally distributed with a standard deviation of 10. 1. Construct 90, 95, and 99% C. I for the population mean? 2. How do you interpretate these C. I(s)? 3. Which one do you prefer to use? Why? 4. How does sample size affect the width of the CI(s)? Consider n= 49 and then 490? Q2. Use the t-distribution to find the reliability factor for a confidence interval based on the following confidence coefficient and sample size: a b c d Confidence level 0.95 0.99 0.9 0.95 Sample size 24 8 30 15 Cont..HW5 • • Q3. A sample of 16 girls had a mean weight of 71.5 and a standard deviation of 12 pounds respectively. Assuming normal distribution, find the 90, 95, 99 % confidence intervals for μ? Q4. A simple random sample of 16 subjects yielded the following values of urine excreted arsenic (mg/day): Subject Value Subject Value 1 0.007 9 0.012 2 0.030 10 0.006 3 0.025 11 0.010 4 0.008 12 0.032 5 0.030 13 0.006 6 0.038 14 0.009 7 0.007 15 0.014 8 0.005 16 0.011 Construct a 95% confidence intervals in μg/day for the population mean Pharmaceutical Statistics Lecture 11 Estimating The Difference between Two Population Means Estimating The Difference between Two Population Means • In the previous lectures, we learned how to estimate the mean of single population and how to construct C.I for the mean. • In some cases we are interested in estimating the difference between means of two populations. • This estimation helps us in deciding whether or not it is likely that the two populations means are equal. • If the constructed confidence interval does include “zero” we can say that the populations may be equal, and vise versa. 0 20 The C.I does not include zero, thus population means may NOT be equal The C.I does include zero, thus population means may be equal 0 2 Sample from pop 1 (n1,X1 ,s1 ) Sample from pop 2 (n2,X 2 , s2) C.I for the sample mean differences 0 20 The C.I does not include zero, thus population means may NOT be equal The C.I does include zero, thus population means may be equal 0 2 A) When we withdraw two samples from two populations that are normally distributed with known σ1 and σ2 • Normal distribution populations • Population variances are known • Small or large samples • z-table C.I. : Estimator ∓ (Reliability Coefficient) x (Standard Error) C.I (X 1 X 2 ) z(1 / 2)X X 1 X 1 X 2 2 [(12 /n1) (22 /n 2 )] Example • A research tem is interested in the difference between two serum uric acid levels in patients with or without Dawn’s syndrome. A sample of 12 individual with Dawn’s syndromes yielded a mean of X1=4.5 mg/100 mL. A sample of 15 normal individuals of the same age and sex were found to have a mean value of X 2 =3.4 mg/100 mL. Assume that the two populations are normally distributed. With variances equal to 1 and 1.5 (Dawn’s and normal, respectively). Construct the 95 % confidence interval for the Dawn's Normal (X 1 X 2 ) z(1 / 2)XX 1 2 (X 1 X 2 ) 4.5 3.4 1.1 At 95% confidence level, the reliability coefficient from the z-table is 1.96 X 1 X 2 [(12 /n1) (22 / n2 )] [(1/12) (1.5/15)] 0.4282 C.I : (X 1 X 2 ) z(1 / 2)X 1.1 1.96(0.4282) 1.1 0.84 (0.261.94) 1 X 2 What do you conclude from this confidence interval?? we can conclude that the population means may be not equal with a confidence level of 95% [zero is not included in the C.I. B) When we withdraw two samples from two populations that are NOT normally distributed (σ1 and σ2 are unknown) and the sample size is large If we have no idea about the normality of the parent distributions and their parameters, we use the CLT to justify our use of the z-table to find the reliability coefficient. We use the samples standard deviations to calculate the standard error of the difference. This is only true for large samples Example: Two samples from two populations were withdrawn. The first sample (112 subjects) from the first pop yielded a mean of 401.8 and a standard deviation of 226.4. The second sample (75 subjects) from the second pop yielded a mean of 828.2 and a standard deviation of 274.9. Construct the 99% C.I for the difference between population means? • We have no idea about the normality of the parent populations • Since the samples are large, we can apply the CLT and assume that the distribution of the sample mean difference is normally distributed • We can use the z-table to find the reliability coefficient and the standard deviation values of the samples to calculate the standard error of the mean differences B) When we withdraw two samples from two populations that are NOT normally distributed (σ1 and σ2 are unknown) and the sample size is large C.I (X 1 X 2 ) z(1 / 2)X X 1 X 2 1 X 2 Xs X [(s12 /n1 ) (s22 /n2)] 1 2 (X 1 X 2 ) 828.2226.4 426.4 X 1 X s 2 X 1 X Due to large n 2 2 [(s12 /n1 ) (s/ [(274.9)2 /75 (226.42 ) /112] 38.2786 2 n2 )] C.I : (X 1 X 2 ) z(1 / 2)X 42 6.4 2.58(38.2786) 426.4 98.76 1 X 2 Due to CLT (32 7.6 52 5.2) We have no idea about the normality of the parent populations, but because the samples size is large, we applied the CLT and assumed that the distribution of the sample mean difference is normally distributed. With this in mind, we used the z-table to find the reliability coefficient (2.58). Also we used the standard deviation values for both samples to calculate the standard error of the means difference (38.2786). The interval does not include zero, so we can conclude that the population means may be not equal with a confidence level of 99%. C) When we withdraw two samples from two populations that are normally distributed (σ1 and σ2 are unknown) and the sample size is small • Parent pops: Normally distributed • Variance for pops: Unknown • Sample size: Small We can not use the z-distribution in this case. We need to use the t-table to find the reliability factor and we need to use standard deviations of the two samples to find the standard error Important note in this case: The calculation of the standard error from s1,n1 and s2, n2 depend on the equality of the parent populations variances C.1) If they are equal: we use tdistribution C.2) If they are not equal: we use the t’-distribution C.1) When we withdraw two samples from two populations that are normally distributed (σ1 and σ2 are unknown and equal) and the sample size is small • • In this case, we need to consider the samples variances+ to use t-table. Since the sample variance is dependent on the sample size, we need to take this in consideration for samples with different sizes (n) to calculate the pooled estimate of the common variance [This formula pools both sample 2 2 (n1 1)s1 (n 2 1)s2 s n1 2n 2 variances with their corresponding weight that based on the sample size] 2 p • NOTE: If the sample size for both independent samples is equal, we take directly the arithmetic mean of the two samples variances (simple average) . The standard error of the estimate will be: s 2p s 2p X 1 X • s 2 X 1 X 2 n1 n2 The 100(1-α)% confidence interval for the mean difference will be: C.I : (X 1 X 2 ) t(1 / 2)X (X 1 X 2 ) t (1 /2) s 2p n1 p 2 1 2 X s2 n We use the t-table with D.F= n1+n2-2 to find the reliability coefficient Example • Two independent samples were taken from normally distributed populations A and B that have equal variance. Sample from A (n=13) resulted in a mean value of 21 and standard deviation of 4.9. Sample from B (n=17) resulted in a mean value of 12.1 and standard deviation of 5.6. Construct 95% confidence interval for the difference between the means of population A and B. Since A&B are normally distributed, their variances are unknown and equal, and the sample size is small we use the t-table and pooled estimate of the common variance 2 2 (n 1)s (n 1)s 1 2 2 s2p 1 n1 n2 2 X1 X s 2 X 1 X 2 [(s /n1 ) (s/ n2)] 2 p C.I (X 1 X 2 ) t(1 / 2)X 2 p 1 X 2 n1 13,s1 4.9 n2 17,s2 5.6 ............................................................. s (13 1)4.9 2 (17 1)5.6 2 28.21 13 17 2 28.21 28.21 1.957 sX 1 X 2 13 17 ............................................................. 2 p C.I (X 1 X 2 ) t(1 / 2) s 2p n1 s 2p n2 (21 12.1) 2.045 * (1.957) 8.9 4.0085 (4.9 12.9) What does this mean?? C.2) When we withdraw two samples from two populations that are normally distributed (σ1 and σ2 are unknown and NOT equal) and the sample size is small • We can not use the t-table to find the reliability factors for D.F= n1+n2-2. • Solution: Instead of finding the reliability factor, we need to compute it taking into consideration the reliability factor for each sample and the weight of each reliability factor. How to compute the reliability factor? t' 1 /2 w1t1 w 2 t 2 w1 w s12 w1 n1 2 s22 w2 n2 t1 t1 / 2 ....... for: n1 1 t2 t1 / 2 ....... for: n2 1 100(1-α)% Confidence interval formula for μ1-μ2 ' C.I (X X) t(1 1 /2) 2 s2 s22 1 n1 n 2 Example • Two independent samples were taken from normally distributed populations A and B. We do not have any valid reason to assume the equality of the population variance for A&B. Sample from A (n=13) resulted in a mean value of 4.5 and standard deviation of 0.3. Sample from B (n=17) resulted in a mean value of 3.7 and standard deviation of 1.0. Construct 95% confidence interval for the difference between the means of population A and B. 0.32 0.007 13 12 w2 0.06 17 t1 t1 / 2 ....... for : n1 1 2.1788 w1 t2 t1 / 2 ....... for : n2 1 2.199 t' 1 /2 w t w2t2 11 w1 w 2 (0.007 2.1788) (0.06 2.199) 0.007 0.006 2.1262 C.I (X 1 X 2 ) t ' (1 /2) (4.5 3.7) 2.1262 s12 s22 n1 n 2 0.3 2 13 12 17 0.8 2.1262(0.256) (0.25 1.34) What do you conclude from the confidence interval above?? Should I use z, t, or t’ table? Summary: Estimating The Difference between Two Population Means