Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
7. Statistical Intervals Based on a Single Sample Li Jie 7.1 Basic Properties of Confidence Intervals Suppose that the parameter of interest is a population mean and that 1.The population distributi on is normal. 2.The value of the population standard deviation is known. Li Jie Example 7.1: n 31 x 80 2.0 find a confidence int ervals ( CI ) for P( u1 u2 ) 1 Li Jie X 1 , X 2 , , X n x1 , x 2 , , xn from N ( , 2 ) 2 X ~ N( , ) n X ~ N ( 0 ,1 ) n 2 2 Li Jie DEFINITION: If after observing X1 x1 , X 2 x2 ,, X n xn , we compute the observed sample mean x and then substitute into (7.4) in place of X , the resulting fixed interval is called a 95% confidence interval for . This CI can be expressed either as x 1.96 , x 1.96 is a 95% CI n n for x 1 . 96 x 1 . 96 or as with 95% n n confidence. A concise expression for the interval is , where – gives the left endpoint (lower x 1.96 / n limit) and + gives the right endpoint (upper limit) Li Jie Example : The quantities needed for computation of the 95% CI for average preferred height are 2.0, n 31, and x 80.0 . The resulting interval is x 1.96 n 80.0 1.96 2.0 80.0 .7 79.3,80.7 31 That is , we can be highly confident that 79.3 80.7 . This interval is relatively narrow , indicating that has been rather precisely estimated . Interpreting a confidence interval (P281) Li Jie Other Levels of Confidence DEFINITION: A 100(1 - )% confidence interval for the mean of a normal population when the value of is known is given by , x z 2 x z 2 n n or , equivalently, by x z 2 n Li Jie Example The production process for engine control housing units of a particular type has recently been modified. Prior to this modification, historical data had suggested that the distribution of hole diameters for bushing on the housing was normal with a standard deviation of .100 mm . It is believed that the modification has not affected the shape of the distribution or the deviation, but that the value of the mean diameter many have changed. A sample of 40 housing units is selected and hole diameter is determined for each one, resulting in a sample mean diameter of 5.426 mm. Let’s calculate a confidence interval for true average hole diameter using a confidence level of 90%. This requires that 100(1- )=90, from which .10 and za 2 z0.5 1.645 . The desired interval is then .100 5.426 1.645 5.426 .026 5.400,5.452 40 Li Jie with a reasonably high degree of confidence, we can say that 5.400 5.452 This interval is narrow because of the small amount of variability in hole diameter .100 . Li Jie Confidence Level, Precision, and Choice of Sample Size Confidence Level 1 Interval width u2 u1 Precision: 1 (int erval width ) An appealing strategy is to specify both the desired confidence level and interval with and then determine the necessary sample size. Li Jie Example 7.4 Extensive monitoring of a computer time-sharing system has suggested that response time to a particular editing command is normally distributed with standard deviation 25 millisec. A new operating system has been installed, and we wish to estimate the true average response time for the new environment. Assuming that response times are still normally distributed with 25 , what sample size is necessary to ensure that the resulting 95% CI has a width of 10? The sample size n must satisfy 10 2 1.96 25 / n Li Jie Rearranging this equation gives n 2 1.96 25 10 9.80 so n 9.80 96.04 2 Since n must be an integer, a sample size of 97 is required. Li Jie The general formula for the sample size n necessary to ensure an interval width w is obtained from w 2 za 2 n 2 n 2 za 2 as w The half-width 1.96 / n of the 95% CI is sometimes called the bound of on the error of estimation associated with a 95% confidence level ; Li Jie Deriving a Confidence interval Let X 1 , X 2 , X n denote this sample on which the CI for a parameter is to be based . Suppose a random variable satisfying the following two properties can be found: 1. The variable depends functionally on both X 1 , X 2 , X n and . 2. The probability distribution of the variable does not depend on or on any other unknown parameters. Li Jie Page 284 Li Jie Example 7.5 A theoretical model suggest that the time to breakdown of an insulating fluid between electrodes at a particular voltage has an exponential distribution with parameter . A random sample of n=10 breakdown times yields the following sample data : x1 41.53, x2 18.73, x3 2.99 , x4 30.34 , x5 12.33 , x6 117.52 , x7 73.02 , x8 223.63 , x9 4.00 , x10 26.78 A 95% CI for and for the true average breakdown time are desired. Li Jie let h( X 1 , X 2 , , X n ; ) 2 X i It can be shown that this random variable has a probability distribution called a chi-squared distribution with 2n degrees of freedom (df) Li Jie for n=10, P 9.591 2 X i 34.170 0.95 division by P 9.591 / X 2 34.170 / X 0.95 i i P 2 X i / 34.170 1 / X i / 9.591 0.95 2 X 34.170 , X i i / 9.591 32.24 ,114.87 Li Jie Page 285, Exercise 1,2,3,4,8 Li Jie 7.2 Large-sample Confidence Intervals for a Population Mean and Proportion Li Jie A large-sample Interval for Provided that n is large, the CLT implies: approximately X PROPOSITION: ~ N( , 2 n ) If n is sufficiently large, the standardized variable X Z S n has approximately a standard normal distribution. This implies that s x za 2 n is a large-sample confidence interval for with confidence level approximately 100(1-α)%. This formula is valid regardless of the shape of the population distribution. Li Jie Z Z X n X S ~ N ( 0,1 ) approximately ~ n N ( 0, 1 ) P( u1 u2 ) P( u2 u1 ) P( X u2 X X u1 ) P( X u2 ( X u1 S S n n X S ) ( n X u1 S X u2 S n n ) ) Li Jie Example 7.6 The alternating-current (AC) breakdown voltage of an insulating liquid indicates its dielectric strength. The article “test practices for the AC breakdown voltage testing of insulation liquids,” gave the accompanying sample observations on breakdown voltage of a particular circuit under certain conditions. 62 50 53 57 41 53 55 61 59 64 50 53 64 62 50 68 54 55 57 50 55 50 56 55 46 55 53 54 52 47 47 55 57 48 63 57 57 55 53 59 53 52 50 55 60 50 56 58 Li Jie A boxplot of the data show a high concentration in the middle half of the data. There is a single outlier at the upper end, but this value is actually a bit closer to the median(55) that is the smallest sample observation. 40 50 60 70 Voltage Figure 7.5 Li Jie Summary quantities include n=48, 2 x 2626 and x i i 144950 From which x 54.7 and s 5.23 The 95% confidence interval is then 5.23 54.7 1.96 54.7 1.5 53.2 ,56.2 48 That is , 53.2 56.2 With a confidence level of approximat ely 95%. The interval is reasonably narrow, indicating that we have precisely estimated . Li Jie A general Large-sample Confidence Interval (omit) P z 2 ˆ z ˆ 2 1 X ~ Bin( n , p ) p unknown P z 2 p̂ p z p( 1 p ) n 2 1 Li Jie A Large-Sample Confidence interval for a Population Proportion (omit) PROPOSITION: A confidence interval for a population proportion p with confidence level approximately 100(1-α)% has p̂ lower confidence limit za2 2 za 2 2n p̂q̂ z 2 a 2 n 4n2 n 1 za2 2 and p̂ upper confidence limit za2 2 2n za 2 p̂q̂ z 2 a 2 n 4n2 n 1 za2 2 Li Jie Example 7.8 The article “Repeatability and Reproducibility for Pass/Fail Data” reported that in n=48 trials in a particular laboratory, 16 resulted in ignition of a particular type of substrate by a lighted cigarette. Let p denote the long-run proportion of all such trials that would result in ignition. p̂ 16 48 .333 A point estimate for p is .A confidence interval for p with a confidence level of approximately 95% is 2 2 .333 1.96 96 1.96 .333.667 48 1.96 9216 1 1.96 48 2 .373 .139 .217,.474 1.08 The traditional interval is .333 1.96 .333.667 48 .333 .133 .200,.466 Li Jie Equating the width of the CI for p to a prespecified width w gives a quadratic equation for the sample size n necessary to give an interval with a desired degree of precision. Suppressing the subscript in z 2 , the solution is 2 2 2 4 2 2 4 2 z pˆ qˆ z w 4 z pˆ qˆ pˆ qˆ w w z n w2 2 Neglecting the terms in the numerator involving w gives 4 z 2 pˆ qˆ n w2 This latter expression is what results from equating the width of the traditional interval to w. Li Jie One-Side Confidence Intervals PROPOSITION A large-sample upper confidence bound for is s x z n and a large-sample lower confidence bound for is s x z n A one-sided confidence bound for p results from replacing z 2 by z and by either +or – in the CI formula for p. Li Jie Example7.10: The slant shear test is the most widely accepted procedure for assessing the quality of a bond between a repair materials and its concrete substrate. The article “Testing the Bond Between Repair Materials and Concrete Substrate” reported that in one particular investigation, a sample of 48 shear strength observations gave a sample 2 N / mm mean strength of 17.17 and a sample standard deviation of 3.28 N / mm2 . A lower confidence bound for true average shear strength shear μ with confidence level 95% is 3.28 17.17 1.645 48 17.17 0.78 16.39 That is ,with a confidence level of 95%, the value of μ lies in the interval (16.39, ∞). Li Jie 7.3 Intervals Based on a Normal Population Distribution ASSUMPTION The population of interest is normal, so that X 1 , X 2 ,, X n constitutes a random sample from a normal distribution with both μ and unknown. Li Jie THEOREM: When X is the mean of a random sample of size n from normal distributi on with mean , the rv X T S n has a probability called a t distribution with n-1 degrees of freedom (df). Li Jie Properties of t Distributions A t distribution is governed by only one parameter, called the number of degrees of freedom of the distribution. Li Jie Properties of t Distributions Let tv denote the density function curve for v df. 1. Each tv curve is bell - shaped and centered at 0. 2. Each tv curve is more spread out than the standard normal curve. 3. As v increases, the spread of the correspond ing tv curve decreases. 4. As v , the sequence of tv curves approaches the standard normal curve . Li Jie z curve t 25 curve t5 curve 0 Figure 7.6 t v and z curve Li Jie Notation Let t ,v =the number on the measurement axis for which the area under the t curve with v df to the right of t ,v is ; t ,v is called a t critical value . t a ,v curve Shaded area a 0 t a ,v Figure 7.7 A pictorial definition of t a ,v Li Jie The One-Sample t Confidence Interval PROPOSITION: Let x and s be the sample mean and sample standard deviation computed from the results of a random sample from a normal population with mean μ . Then a 100(1α)% confidence interval for μ is s s , x t a 2, n 1 x t a 2,n 1 n n or, more compactly, x ta 2,n1 s n Li Jie An upper confidence bound for μ is s x t 2,n 1 n and replacing +by – in this latter expression gives a lower confidence bound for μ, both with confidence level 100(1- α)% . Li Jie Example 7.11 As part of a larger project to study the behavior of stressed-skin panels, a structural component being used extensively in north American, the article “Time – Dependent Bending Properties of Lumber” reported on various mechanical properties of Scotch pine lumber specimens . Consider the following observations on modulus of elasticity obtained 1 minute after loading in a certain configuration : 10490 16620 17300 15480 12970 17260 13400 13900 13630 13260 14370 11700 15470 17840 14070 14760 Li Jie 18000 17000 16000 15000 14000 13000 12000 11000 10000 -2 -1 0 1 2 Figure 7.8 Li Jie hand calculation of the sample mean and standard deviation is simplified by subtracting 10,000 from each observation : yi xi 10 ,000 y y i 72,520 2 i 392,083,800 y 4532.5 s y 2055.67 x 14,532.5 s x 2055.67 x t.025,15 s 2055.67 14,532.5 2.131 13,437.3, 15,627.7 n 16 Li Jie 7.4 Confidence intervals for the variance and standard deviation of a normal population THEOREM Let X 1 , X 2 , , X n be a random sample from a normal distributi on with parameters and . Then the rv X n 2 n 1S 2 2 i 1 X 2 i 2 probabilit y distributi on has a chi - squared with n - 1df . 2 Li Jie f x; v v 8 v 12 v 20 x Figure 7.9 Graphs of chi-squared density functions Li Jie Notation: 2 Let a,v , called a chi-squared critical value, denote the number on the measurement axis such that of the area under the chi-squared curve with v df lies to the right of a2,v. Li Jie v2 pdf Each shaded area .01 Shaded area 2 a ,v (a) .299,v 02.1,v (b) Figure 7.10 a2,v notation illustrated Li Jie A 1001 % confidence interval for the variance 2 of a normal population has lower limit n 1s 2 a2 2,n1 and upper limit n 1s 2 12a 2,n1 A confidence interval for has lower and upper limits that are the square roots of the corresponding limits in the interval for 2 . Li Jie Example 7.15 The accompanying data on breakdown voltage of electrically stressed circuits was read from a normal probability plot that appeared in the article “Damage of Flexible Printed Wiring Boards Associated with Lightning-Induced Voltage Surges”. The straightness of the plot gave strong support to the assumption that breakdown voltage is approximately normally distributed . 1170 1510 1690 1740 1900 2000 2030 2100 2190 2200 2290 2380 2390 2480 2500 2580 2700 Li Jie let 2 denote the variance of the breakdown voltage distribution. The computed value of the sample variance is s 2 137,324.3 , the point estimate of 2 . With df =n1=16, a 95% CI require 6.908 and 28.845 . The interval is 16137,324.3 16137,324.3 , 76,172.3,318,064.4 2 .975,16 28.845 6.908 2 .025,16 Taking the square root of each endpoint (276.0,564.0) as the 95% CI for . Li Jie