Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 8 Fundamental Sampling Distributions and Data Descriptions 8.1 Random sampling Population and Samples Population: The totality of observations with which we are concerned, whether their number be finite or infinite, constitutes what we call a population. Def 8.1 A population consists of totality of the observations with which we are concerned Def 8.2 A sample is a subset of a population Any sampling procedure that produces inferences consistently overestimate (高估)or consistently underestimate(低估) some characteristic of the population is said to be biased. To eliminate any possibility of bias in the sampling procedure, it is desirable to choose a random sample in the sense that observations are made independently and at random. Def 8.3 Let X1, X2, …, Xn be n independent random variables, each having the same probability distribution f(x). We then define X1, X2, …, Xn to be a random sample of size n from the population f(x) and write its joint probability distribution as f ( x1 , x 2 ,K , x n ) = f ( x1 ) f ( x 2 ) L f ( x n ) . 1 8.2 Some Important Statistics To elicit information about the unknown population parameters. Def 8.4 Any function of random variables constituting a random sample is called a statistic. Central Tendency in the Sample; The Sample Mean Eq: The most commonly used statistics for measuring the center of a set of data, are the Mean, Median, and mode. Def 8.5 If X1, X2, …, Xn represent a random sample of size n, then the sample mean is defined by the statistic n X = ∑ X i =1 n i . The Sample Variance Def 8.6 If X1, X2, …, Xn represent a random sample of size n, then the sample variance is defined by the statistic n S2 = ∑ (X i =1 i − X )2 n −1 2 Ex 8.1 : A comparison of coffee prices at 4 randomly selected grocery stores in San Diego showed increases from the previous month of 12, 15, 17 and 20 cents for a 1 pound bag. Find the variance of this random sample of price increases. Sol: x= 12 + 15 + 17 + 20 64 = 4 4 4 s2 = ∑ ( xi − x ) 2 4 ∑ (x i − 16) 2 = i =1 n −1 3 2 2 (12 − 16) + (15 − 16) + (17 − 16) 2 + (20 − 16) 2 34 = = 3 3 i =1 3 Theorem 8.1(重要) If S2 is the variance of a random sample of size n n n∑ X − ∑ X i i =1 S 2 = i =1 n(n − 1) n 2 2 i Proof: n S 2 = ∑ i =1 n −1 n X = ( X i − X )2 ∑ X i =1 n = = i − 2 XX i + X 2 ) n −1 n = ∑ X i =1 2 i − 2X n ∑ i =1 Xi + X n −1 n n 2 i =1 2 i n S ∑ (X ∑ i =1 X 2 i − 2( ∑ X i =1 n n i n )∑ X i + ( ∑ X i =1 n i =1 n −1 i 2 n )2 = n∑ X i =1 i n − (∑ X i ) 2 i =1 n ( n − 1) Def 8.7 : The sample standard deviation, denoted by S, is the positive square root of the sample variance 4 2 Ex 8.2 : Find the variance of the data 3, 4, 5, 6, 6, and 7, representing the number of trout caught by a random sample of 6 fishermen on June 19, 1996, at Lake Muskoka. Sol n n∑ Xi − ∑ Xi i =1 H int : S 2 = i =1 n(n − 1) n n ∑ Xi 2 2 2 = 3 2 + 4 2 + 5 2 + 6 2 + 6 2 + 7 2 = 171 i =1 n ∑ Xi = 3 + 4 + 5 + 6 + 6 + 7 = 31 i =1 n=6 2 n n∑ Xi − ∑ Xi 2 i =1 = S 2 = 6 *171 − (31) = 13 S 2 = i =1 6(6 − 1) 6 n(n − 1) n 2 Sample s tan dard deviation s = 13 Exercises 8.2 : 8.1, 8.3, 8.7, 8.13, 8.15 5 6 8.3 Data Displays and Graphical Methods Box and Whisker Plot (Boxplot---盒鬚圖) This plot encloses the interquartile range of the data in a box that the median displayed within. The interquartile range has as its extremes the 75th percentile (upper quartile) and 25th percentile(lower quartile) Boxplot can provide the viewer information regarding which observations may be outliers If the distance from the box exceeds 1.5 times the interquartile range, the observation may be labeled as outliers. Quantile Plot Def 8.8 A quantile of a sample, q(f ) is a value for which a specified fraction f of the data values is less than or equal to q(f ). 6 8.4 Sampling Distributions Def 8.10 The probability distribution of a statistics is called a sampling distribution 8.5 Sampling Distribution of Means Theorem 8.2(Central Limit Theorem) If X is the mean of a random sample of size n taken from a population with mean µ and finite variance σ 2 , then the limiting form of the distribution of Z = X −µ δ , as n → ∞ , is the standard normal distribution n(z; 0,1). n Ex 8.6 : An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed, with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours Sol 母體分布: µ = 800 , σ = 40 樣本分布 : n = 16, σ x = 40 = 10 16 775 − 800 ) = P ( Z > −2.5) = 0.0062 (查表 p.670 ) P ( X > 775 ) = P ( Z > 10 7 Inference on the Population Mean One very important application of the central limit theorem is the determination of reasonable values of the population mean µ Ex 8.7 : An important manufacturing process produces cylindrical component parts for the automotive industry. It is important that the process produce parts having population mean of 5mm. The Engineer involved conjectures that the population mean is 5.0mm 100 parts are produced. We know that population standard deviation σ =0.1 sample average x = 5.027 mm Does this sample information appear to support or refuse the conjecture Sol: P[ ( X − 5) ≥ 5.027 − 5.0] = P[( X − 5) ≥ 0.027] + P[( X − 5) ≤ −0.027] 0.027 = 2 P Z > = 2 P( Z > 2.7) = 0.007 0.1 100 Chance an x that is 0.027 mm from the mean in only 7 in 1000 experiments. As result, this experiment with x =5.027 doesn’t give supporting evidence to conjuncture that µ = 5.0 Sampling Distribution of the difference Between Two Average Theorem 8.3 If independent samples of size n1 and n2 are drawn at random from two populations with means µ1 and µ2 and variance σ 1 and σ 2 ,then the sampling 2 2 distribution of the differences of means, X 1 − X 2 , is approximately normally distribution with the mean and variance given by µX Hence Z = 1−X2 = µ1 − µ 2 and ( X 1 − X 2 ) − ( µ1 − µ 2 ) (σ 2 1 ) ( n1 + σ 2 n 2 2 ) σ X −X 1 2 2 = σ 12 n1 + σ 22 n2 is approximately a standard normal variable. 8 Ex 8.8 : Eighteen specimens are painted using type A and the drying time in hours is recorded on each. The same done with type B. The population standard deviations are both to be 1.0 Assuming that mean drying time is equal for two types of paint, Find ( P X A − X B > 1.0 ) Where X A , X B are average drying times for sample size n A = n B = 18 Sol Assuming that mean drying time is e qual for two types of paint Q µ X A -X B = 0 σ X2 A− X B = ( σ A2 nA + σ B2 nB = ) 1 1 1 + = 18 18 9 P ( X A − X B ) > 1.0 = P( Z > 1.0 − 0 1 9 ) = P( Z > 3.0) = 1 − P( Z < 3.0) = 0.0013 9 Ex 8.9 : The two television picture tubes of manufacturer A and B Manufacture A B Mean Lifetime 6.5 6.0 Standard deviation 0.9 0.8 Sample size 36 49 What is the probability that a random sample tube from A will have a mean lifetime that is at least 1 year more than from B? Sol: µX 1−X 2 = 6.5 − 6.0 = 0.5 and σ X 1 − X 2 = 0.81 0.64 + = 0.189 36 49 1.0 − 0.5 = 2.65 0.189 P( X 1 − X 2 ≥ 1.0) = P( Z > 2.65) = 1 − P( Z < 2.65) = 1 − 0.9960 = 0.0040 z= 10 8.6 Sample distribution of S2 If a random sample of size n is drawn from a normal distribution with mean µ and variance σ2, and we obtain a value of static S2 Theorem 8.4 If S2 is the variance of a random sample of size n taken from a normal population having the variance σ2, then static χ = 2 (n − 1) S 2 σ2 n =∑ i =1 ( X i − X )2 σ2 has a chi-squared distribution with ν = n − 1 degrees of freedom. 11 Ex 8.10 : A manufacturer of car batteries guarantees that his batteries will last, on the average, 3 years with a standard deviation of 1 year. If five of these batteries have lifetimes of 1.9, 2.4, 3.0, 3.5, and 4.2 years, is the manufacturer still convinced that his batteries have a standard deviation of 1year? Assume that the battery lifetime follows a normal distribution. 已知 s 2 = then n n i =1 i =1 n∑ X i2 − (∑ X i ) 2 χ 2= = n(n − 1) (n − 1) S 2 σ 2 = = 5 * 48.26 − 152 = 0.815 5* 4 (5 − 1)(0.815) = 3.26(介於95%之內 → 所以假設是合理的) 1 12 8.7 t-Distribution In many experimental scenarios knowledge of σ is no more reasonable than knowledge of the population mean µ . Often, in fact an estimate of σ must be supplied by the same sample information that produced the sample average x 若不知σ,只知道μ T= X −µ S n n > 30 → T is standard normal n < 30 → t - distribution If the sample size is large enough (n ≥ 30 ), the distribution of T does not differ considerable from the standard normal distribution. Corollary Let X1, X2, …, Xn be independent random variables that are all normal with mean µ and standard deviation σ .Let X X =∑ x i =1 n n ( X i − X )2 and S = ∑ n −1 i =1 2 Then the random variable T = n X −µ has a t-distribution with ν = n − 1 degree s n of freedom. 13 What does the t-Distribution look like? Ex 8.11 : The t-value with v=14 degrees of freedom that leaves an area of0.025 to the left and therefore an area of 0.975 to the right Sol: t 0.975 = −t 0.025 = −2.145 Ex 8.12 : Find P (−t 0.025 < T < t 0.05 ) Sol: P (−t 0.025 < T < t 0.05 ) = 1 − 0.05 − 0.025 = 0.925 14 Ex 8.13 Find k such that P(k<T<-1.761) = 0.045, for random sample of size 15 selected from a normal distribution and T = X −µ s n Sol From Table A.4 : v = 14 , t 0.05 − t 0.05 =-1.761 let k = − tα 0.045 = 0.05 − α k= − t 0.05 =-2.977 1.761 α = 0.005 and P( -2.977< T <-1.761)=0.045 15