Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sampling Random sampling Each possible sample has an equal chance of being selected. sampling sample population 2 main types of random sampling random sampling sampling with replacement sampling without replacement (infinite population) (finite population) In addition, sampling method can be also classified according to the application as follows. 1. Simple random sampling (is a probability sampling) - A simple random sample (SRS) from a finite population gives each possible sample set an equal probability of being selected. - A SRS from an infinite population requires that all sample observations be statistically independent. 2. Stratified sampling - A stratified sample is obtained by forming strata in the population and from each stratum, selecting a simple random sample. 1 group 1 sa ing pl m group 2 stratification group 3 population sample Stratified sampling 3. Cluster sampling Is obtained by selecting a set of clusters from a population on the basis of simple random sampling. The sample is formed by taking a census of each cluster. 4. Systematic sampling A systematic sample is formed by selecting one unit at random and then selecting additional units at evenly-spaced interval (unit interval or time interval) until the sample has been formed. 5. Judgment sampling A Judgment sample is obtained by having an expert who is familiar with the population characteristics select units from the population. 6. Convenience sampling A convenience sample is obtained by-selecting “convenient” population units. PARAMETERS Characteristics of the population Computed from all the individuals of the population Represented by Greek letter (such as ) STATISTICS Characteristics of the samples Computed from the samples drawn from the population Represented by English letter (such as x ) 2 Central Limit Theorem (CLT) Let X1, X2, …Xn be a random sample from a distribution with mean and variance 2, then if n is sufficiently large, x has approximately a normal distribution with and x and x2 2 n The CLT tells us that the sampling distribution of x will become increasingly closer to a normal distribution as the sample size increases, and that when the sample size becomes infinite, the sampling distribution of x is the normal distribution. CLT can be used with any form of population distribution. Statistical Inference Goal – to make inferences about a population based on a subset of it Idea – A sample is drawn from the population. – A sample statistic is used to draw inferences about the population parameter, θ Statistical Inference Estimation Point estimation Hypothesis Testing Interval estimation Estimation – a sample statistic is used to estimate the value of θ Hypothesis Testing – hypothesize a value of θ – Use the sample information to make decision population Parameter sampling process Point estimator sample n (x1,x2,..xn) ˆ g ( x1 , x 2 ,...x n ) estimator population Parameter sampling process Interval estimator sample n (x1,x2,..xn) (ˆL ,ˆU ) estimator ˆL g1 ( x1 , x 2 ,..., x n ) ˆU g 2 ( x1 , x 2 ,..., x n ) 3 Chapter 6 Point Estimation θ = parameter = estimator of θ Unbiased minimum variance criteria (UMV) Used for selecting the best estimator Based on 2 factors - unbiasedness - minimum variance (Among all the unbiased estimators of θ, choose the one with minimum variance.) 1. Unbiasedness An estimator is unbiased if E( ) = θ for every possible value of θ (ie. the average of the values of computed in each of all possible samples of size n is θ) ˆ1 = unbiased estimator ˆ1 E (ˆ1 ) ˆ2 = biased ˆ2 E (ˆ2 ) 2. Unbiased minimum variance estimator An estimator of a population parameter θ is called an unbiased minimum variance (UMV) estimator if the expected value of is θ, and among all unbiased estimators of θ , has the least amount of variability. 4 ˆ1 ˆ2 ˆ2 ˆ1 = UMV estimator 5 Ch 7 Interval Estimation Interval estimator of population mean A 100 (1 – α)% confidence interval for the mean μ of a normal population when the value of σ is known is : x . 2 n The sample size n necessary to ensure an interval length L is obtained from : L = 2 . n 2 n (2 . ) 2 L 2 Derivation 1. Let x1 ,…, xn be a random sample from normal population having a mean μ and standard deviation σ 2. To find confidence interval (CI) for any parameter θ, we have to look for its estimator ( ) that : (a) has approximately a normal distribution, (b) is unbiased, and © has a known value of standard deviation ( ) In this case, x is an estimator of the population mean ( μ ) that can be used to find the confidence interval because it satisfies all the conditions : (a) x has a normal distribution ( X i is from a normal population) (b) x is unbiased estimator, (c) x n 3. Transform x to its standardized valve ( Z) because the Z – value can be used conveniently to find the confidence interval due to its standard normal distribution x Z = x = x / n 4. To find the confidence interval, the value of Z must be between 2 values a< Z <b a< x / n <b ------------------------ (1) Confidence Interval 5. For 100(1 – )% confidence interval, the area between a and b is 1 - P (a < x / n <b) = 1- 6 Area = Area = 1 2 a 2 b 0 2 From Z notation a = , b = Thus, the shaded area on each side is 2 2 1 z z 0 2 2 Thus, (1) becomes 2 x / n 2 Solving for μ , we obtain the confidence Interval for population mean μ x . 2 n x . 2 n Zα Notation Zα denotes the value on the measurement axis for which the area to the right of Zα is equal to α Area P( Z z ) 0 z Similarly, denotes that the area to the right is equal to 2 2 Interpreting a confidence interval 95% confidence interval of μ = In the long run 95% of the computed CI will contain μ 7 True value of (1) (2) (3) (4) (5) (6) .. . ( ) ( experiment 1 experiment 2 ) experiment 3 ) ( ( . . . ) ( ) ( ) ( 95% contain ) ( ) 5% do not ( ) Long sequence of replications of an experiment. Example If the 95% CI of μ = (79.3 , 80.7) RIGHT - (79.3 , 80.7) may be one of 95 confidence intervals that contain μ WRONG – μ is between (79.3 , 80.7) with probability 0.95 Example 1 Extensive monitoring of a computer time-sharing system has suggested that response time to a particular editing command is normally distributed with standard deviation 25 milliseconds. A new operating system has been installed, and it is desired to estimate the true average response time for the new environment. Assuming that response times are still normally distributed with = 25, what sample size is necessary to ensure that the resulting 95 % confidence interval has a length of (at most) 10? Sol. 95 = 100(1 – α) α = 0.05 n = 2 . L 2 2 25 = 2 0.025 . 10 2 2 25 = 2 (1.96). = 96.04 10 A sample size of 97 is required Note - The smaller the desired length L, the larger n must be 8 Ex . 2 Suppose that when a signal having value μ is transmitted from location A, the value received at location B is normally distributed with mean μ and variance 4. To reduce error, suppose the same value is sent 9 times. If the successive values received are 5 , 8.5 , 12 , 15 , 7 , 9 , 7.5 , 6.5 , 10.5, construct a 95% CI for μ Sol. x 81 = 9 , σ=2, n=9 9 100 (1 – α) = 95 α = 0.05 x . 2 n 9 0.025 . 2 9 2 = 9 ± 1.96 = (7.69, 10.31) 9 Example 3 A machine Is producing ball bearings with diameters of 0.5 inches. Based on lengthy experience with the machine, it is known that the standard deviation of the bearings is 0.005 inches. A sample of 25 ball bearings is selected, and their average diameter is formed to be 0.498 inches. Determine a 99% confidence interval for the population average of ball bearing diameters. Sol. 0.005 , n = 25, x = 0.498 For 99% confidence interval, 100 (1 - ) = 99, = 0.01 x . 2 n 0.498 0.005 . 0.005 25 0.005 0.498 2.575 25 = = (0.4954, 0.5006) Interval estimator for population mean μ - for large sample (n 30) - any population distribution - σ is not known When n is large, a 100(1 – α)% confidence interval for the mean μ of any population distribution is : x . 2 s n Proof Let X1, X2 ,… ,Xn be a random sample from any population having a mean μ and standard deviation σ. If n is large, the Central Limit Theorem implies that x has approximately a normal distribution. 9 Thus x Z = / n So that P ( z 2 has approximately a standard normal distribution. x z ) 1 - / n 2 Therefore the confidence interval is x . 2 n When n is large, S will be close to . Therefore we can use S to estimate if is unknown. Confidence interval is x . 2 s n Example 5 A sample of 56 research cotton samples resulted in a sample average percentage elongation of 8.17 and a sample standard deviation of 1.42 Find a 95% largesample confidence interval for the true average percentage elongation . s Solution x . n 2 95 = 100 (1 - ), = 0.05 x 0.025 . s = 8.17 1.96. n (1.42) 56 = (7.80, 8.54) Interval estimator of population mean μ - for small samples - normal distribution - σ is not known 100(1 – α)% confidence interval for the mean μ is : x t 2 , n 1 . s n A small-sample interval for If the population random variable x is normally distributed, then we know that x will also be normally distributed. Thus ƶ = x / n is a standard normal variable When n is small, s is no longer likely to be close to σ If the standard deviation σ must be estimated by s, then the standardized variable : 10 t= x s/ n will no longer be standard normal variable * The sampling distribution of t is called the “student” t distribution with n – 1 degrees of freedom Properties of t distributions Let t denote the density function curve for degrees of freedom 1. Each t curve is bell – shaped and centered at 0. 2. Each t curve is more spread out than the standard normal (ƶ) curve. 3. As increases, the spread of t curve decreases 4. As → ∞, the sequence of t curves approaches the standard normal curve Z curve t curve 0 tα, Notation Let tα, = the number on the measurement axis for which the area under the t curve with degrees of freedom to the right of tα, is α ; tα, is called a t critical value. t curve area = 0 t , Ex. 6 Four determinations of the percentage of methanol in a certain solution yielded x = 8.34% , s = 0.03% Assuming (approximate) normality of the population of determinations, find a 95% confidence interval for μ Sol. x 8.34 , S = 0.03, unknown use t 95 = 100 (1 - ), = 0.05 x t 2 , n 1 . s n = 8.34 t 0.025, 41 . = 8.34 3.182 0.03 0.03 4 4 = (8.292, 8.388) 11 Ex 8 The use of a small amount of carbon in producing certain steels is beneficial. Too much carbon, however, could be detrimental. Consequently, an upper confidence limit on the mean carbon content of a carbon steel is needed. Let X denote the number of pounds of carbon in a ton of carbon steel (lb/ton). If a sample of size n = 15 is obtained, and it was found that the mean of X is 20 lb/ton and standard deviation is 0.60 lb/ton . Determine a 95% upper confidence limit for the mean of X. x t Sol. 2 , n 1 . s = 20 t 0.025, 151 . n 0.60 15 = 20 2.145 0.60 15 = ……. lb/ton We are 95%confident that the average amount of carbon does not exceed 20.30 lb/ton. Confidence Interval for population proportion A large sample 100(1 – α)% confidence interval for a population proportion p is pq n p 2 x Where p , n = sample size n x = the observed number of successes q 1 p , large sample n p 5 and n q 5 Proof Let p = the proportion of successes in the population (population proportion) n = # of samples x = # of successes in the sample For small sample, x has a binomial distribution with E(x) = np , and σxx = np(1 p) (From Ch. 3) For large sample, x can be approximated by normal distribution The estimator of p = p x 1 x (approximately normally distributed) n n constant p also has a normal distribution p is an estimator that can be used to find confidence interval because (1) p has a normal distribution (2) p is an unbiased estimator because = p p x n = E p 1 1 E x np p n n 12 1 x (3) 2 = V = V p n n x p (1 p ) 1 1 = V (x) = np (1 – p) = n n n p (1 p) = n 2 p 2 Transform p to its standardized value (ƶ) ƶ= pp p For 100(1 – α)% confidence interval , 2 pp p 2 Solving for p , we obtain p . p p . 2 p . 2 p 2 p pˆ (1 pˆ ) pˆ (1 pˆ ) p p . n n 2 confidence interval for population proportion p Example 9 The 1983 Tylenol poisoning episode and other similar incidents have focused attention on the desirability of packaging various commodities in a tamper-resistant manner. In a survey of consumer attitudes toward such packaging, of the 270 consumers surveyed, 189 indicated that they would be willing to pay extra for tamper-resistant packaging. Let p denote the proportion of all consumers who would pay extra for such packaging. Find a 95 % confidence interval for p. x p = 189/ 270 = 0.700 Solution n 0.7 (0.3) pˆ (1 pˆ ) = = = 0.0279 p n 270 95 % confidence interval, use z 0.5 2 p 0.025 pˆ = 0.700 (1.96)(0.0279) = (0.645, 0.755) 13