Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Gibbs sampling wikipedia , lookup
Misuse of statistics wikipedia , lookup
German tank problem wikipedia , lookup
Random samples and estimation Chapter 9: Random samples & sampling distributions Samples and populations Χ2, t, and F distributions Chapter 10: Parameter estimation Point estimation Standard error of a statistic Method of maximum likelihood Method of moments One-sample and two-sample confidence interval estimation Foundation for understanding the next few chapters 1 ETM 620 - 09U Ch. 9: Populations and samples Population: “a group of individual persons, objects, or items from which samples are taken for statistical measurement” Sample: “a finite part of a statistical population whose properties are studied to gain information about the whole” (Merriam-Webster Online Dictionary, http://www.m-w.com/, October 5, 2004) 2 ETM 620 - 09U Examples Population Students pursuing graduate engineering degrees Cars capable of speeds in excess of 160 mph. Potato chips produced at the Frito-Lay plant in Kathleen Freshwater lakes and rivers Samples In general, (x1, x2, x3, …, xn) are random samples of size n if: the x’s are independent random variables every observation is equally likely (has the same probability) 3 ETM 620 - 09U Sampling distributions If we conduct the same experiment several times with the same sample size, the probability distribution of the resulting statistic is called a sampling distribution Sampling distribution of the mean: if n observations are taken from a normal population with mean μ and variance σ2, then: x ... n 2 2 2 2 2 ... 2 x 2 n n 4 ETM 620 - 09U An important consideration … x will be different for every sample For example, suppose we know the time to complete a typical homework problem, in minutes, is known to be uniformly distributed between 5 and 25. Four people are asked to record the time it takes them to complete each of 31 different problems. x 5 ETM 620 - 09U Individual data points Problem # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 6 1 12.64 22.69 22.26 5.65 10.70 12.44 13.52 24.82 19.10 11.00 6.49 14.74 5.81 7.01 21.18 20.12 16.05 24.41 21.11 7.30 24.73 15.02 5.76 16.69 9.01 11.00 23.08 15.33 20.78 17.39 22.01 2 7.01 24.17 7.77 8.28 11.86 12.11 11.08 10.13 21.33 20.00 8.97 15.22 9.61 10.13 19.49 17.53 19.23 18.74 10.24 6.19 23.51 18.50 20.93 8.04 9.12 21.04 5.78 10.13 18.52 19.44 16.14 3 16.93 5.29 9.90 9.39 16.07 23.21 24.51 24.03 24.45 12.03 6.28 12.47 5.10 20.51 6.70 8.47 16.10 15.58 8.56 20.23 23.08 14.80 18.43 22.84 11.68 18.92 19.18 10.83 20.11 24.36 22.46 4 22.98 13.15 5.91 5.34 12.15 14.32 21.13 6.07 14.33 20.51 12.17 24.72 23.52 18.59 7.65 13.10 8.62 20.93 22.34 19.77 15.90 7.92 19.63 12.56 11.50 10.43 14.07 21.04 23.97 12.37 13.82 Histogram - Uniform Distribution 18 16 14 12 10 8 6 4 2 0 5.1 6.9 8.7 10.5 12.3 14.1 15.9 17.7 19.4 21.2 23.0 24.8 μ = __________________ σ2 = _________________ σ = __________________ ETM 620 - 09U Sample means Problem # 7 1 2 3 4 1 2 3 4 12.64 22.69 22.26 5.65 7.01 24.17 7.77 8.28 16.93 5.29 9.90 9.39 22.98 13.15 5.91 5.34 average 14.89 16.32 11.46 7.17 5 6 7 8 9 10.70 12.44 13.52 24.82 19.10 11.86 12.11 11.08 10.13 21.33 16.07 23.21 24.51 24.03 24.45 12.15 14.32 21.13 6.07 14.33 12.70 15.52 17.56 16.26 19.80 10 11 12 13 11.00 6.49 14.74 5.81 20.00 8.97 15.22 9.61 12.03 6.28 12.47 5.10 20.51 12.17 24.72 23.52 15.89 8.48 16.79 11.01 6 14 15 16 7.01 21.18 20.12 10.13 19.49 17.53 20.51 6.70 8.47 18.59 7.65 13.10 14.06 13.75 14.81 0 17 18 19 20 21 16.05 24.41 21.11 7.30 24.73 19.23 18.74 10.24 6.19 23.51 16.10 15.58 8.56 20.23 23.08 8.62 20.93 22.34 19.77 15.90 15.00 19.91 15.56 13.37 21.80 22 23 24 25 15.02 5.76 16.69 9.01 18.50 20.93 8.04 9.12 14.80 18.43 22.84 11.68 7.92 19.63 12.56 11.50 14.06 16.19 15.03 10.33 26 27 28 11.00 23.08 15.33 21.04 5.78 10.13 18.92 19.18 10.83 10.43 14.07 21.04 15.35 15.53 14.33 29 30 31 20.78 17.39 22.01 18.52 19.44 16.14 20.11 24.36 22.46 23.97 12.37 13.82 20.84 18.39 18.61 Histogram - Sample Means 16 14 12 10 8 4 2 7.2 10.1 13.0 15.9 18.9 21.8 x = __________________ x2 = _________________ x = __________________ ETM 620 - 09U Central Limit Theorem Given: X : the mean of a random sample of size n taken from a population with mean μ and finite variance σ2, Then, the limiting form of the distribution of X Z ,n / n is _________________________ 8 ETM 620 - 09U Central Limit Theorem If the population is known to be normal, the sampling distribution of X will follow a normal distribution. Even when the distribution of the population is not normal, the sampling distribution of X is normal when n is large. NOTE: when n is not large, we cannot assume the distribution of X is normal. 9 ETM 620 - 09U Sampling distribution of S2 : Χ2 Given: Z12, Z22, … , Zk2 normally distributed random variables, with mean μ and standard deviation σ = 1. 2 2 2 2 Z Z ... Z Then, 1 2 k follows a χ2 distribution with k degrees of freedom and distribution function, f (u) μ=k 10 1 k /2 k 2 2 u ( k /2)1e u /2 , u 0. (eq. 9-15, pg. 208) σ2 = 2k ETM 620 - 09U χ2 Distribution χ2 χα2 represents the χ2 value above which we find an area of α, that is, for which P(χ2 > χα2 ) = α. In Excel, =CHIDIST(x,degrees_freedom) χ2 is additive, so if Y =∑ χi2 , then kY =∑ki Sample variance, ( n 1) s 2 2 ~ 2 11 ETM 620 - 09U Student’s t Distribution If Z ~N(0,1) and V is a chi-square random variable with k degrees of freedom, then Z T V /k follows a t-distribution with k degrees of freedom. The probability density function is, k 1 1 2 f (t ) , ( k 1) /2 2 k k /2 t 1 k 12 t ETM 620 - 09U t- Distribution Example 9-7 shows that X T S/ n follows a t distribution. In other words, x ~t(n-1) when σ is not know but is estimated by s. In Excel, =TDIST(x,degrees_freedom,tails) gives the probability associated with getting a value above x (tails = 1) or outside +x (tails =2). =TINV(probability,degrees_freedom) gives the value associated with a desired probability, α. 13 ETM 620 - 09U F-Distribution Given: S12 and S22, the variances of independent random samples of size n1 and n2 taken from normal populations with variances σ12 and σ22, respectively, Then, S12 / 12 22S12 F 2 2 2 2 S2 / 2 1 S2 follows an F-distribution with ν1 = n1 - 1 and ν2 = n2 – 1 degrees of freedom. Table V, pp 605-609 gives F-values associated with given α values. In Excel, =FDIST(x,degrees_freedom1,degrees_freedom2) gives probability associated with a given x-value, while =FINV(probability,degrees_freedom1,degrees_freedom2) gives F-value associated with a given α. 14 ETM 620 - 09U Ch. 10: Parameter estimation Example: Say we have 5 numbers from a random sample, as follows: 19, 58, 31, 44, 43 ̅x = ____________________ is an estimate of μ s2 = _____________________ is an estimate of σ2 We want to use “good” estimators (unbiased, minimum error) Unbiased, i.e. E(̂θ) = θ (e.g., E(̅x) = ___, and E(S2) = __) Minimum error, MSE(θ̂ - θ) = E(θ̂ - θ)2 = Var(θ̂ ) 15 ETM 620 - 09U Finding good estimators Method of maximum likelihood take n random samples (x1, x2, x3, .., xn) from a distribution with function f(x,θ) Likelihood function, L(θ) = f(x1,θ) ∙ f(x2,θ) ∙ f(x3,θ) ∙ ∙ ∙ f(xn,θ) Take the derivative with respect to θ and set to 0. See example 10-4, pg. 222 not always unbiased, but can be modified to make it so. Method of moments First k moments about the origin of any function is 't E(X t ) x t f ( x;1, 2 ,..., k )dx, t 1, 2, ..., k Can produce good estimators, but sometimes not as good as MLE 16 (for example). ETM 620 - 09U Interval estimation (1 – α)100% confidence interval for the unknown parameter For some statistic, θ (e.g., μ) looking for L and U such that P{L < θ < U} = 1 – α 17 or _______________ or ________________ ETM 620 - 09U Single sample: Estimating the mean Given: σ is known and X is the mean of a random sample of size n, Then, the (1 – α)100% confidence interval for μ is given by X z /2 ( n ) X z /2 ( n ) Z 18 ETM 620 - 09U Example: mean with known variance A random sample of size 25 is taken from a normal distribution with unknown mean and known variance of 4 (i.e., N(μ,4)). X of the sample is determined to be 13.2. What is the 90% confidence interval around the mean? 19 ETM 620 - 09U What does this mean? Measure of the precision of the estimate Length of the interval is a function of confidence level variance sample size Can vary n to decrease the length of the interval for the same confidence level. 2 z /2 n E For our example, suppose we want an error of 0.25 or less. Then, n = ___________________________________________ 20 ETM 620 - 09U What if σ2 is unknown? If n is sufficiently large (> _______), then the large sample confidence interval is: s X z /2 ( ) n Otherwise, must use the t-statistic … 21 21 EGR ETM252 620- -Ch. 09U 9 Single sample estimate of the mean (σ unknown, n not large) Given: σ is unknown and X is the mean of a random sample of size n (where n is not large), Then, the (1 – α)100% confidence interval for μ is given by s s X t /2,n 1 ( ) X t /2,n 1 ( ) n n -5 22 22 -4 -3 -2 -1 0 1 2 3 4 5 EGR ETM252 620- -Ch. 09U 9 Example A traffic engineer is concerned about the delays at an intersection near a local school. The intersection is equipped with a fully actuated (“demand”) traffic light and there have been complaints that traffic on the main street is subject to unacceptable delays. To develop a benchmark, the traffic engineer randomly samples 25 stop times (in seconds) on a weekend day. The average of these times is found to be 13.2 seconds, and the sample variance, s2, is found to be 4 seconds2. Based on this data, what is the 95% confidence interval (C.I.) around the mean stop time during a weekend day? 23 23 EGR ETM252 620- -Ch. 09U 9 Example (cont.) X = ______________ s = _______________ α = ________________ α/2 = _____________ t0.025,24 = _____________ __________________ < μ < ___________________ 24 24 EGR ETM252 620- -Ch. 09U 9 C.I. on the variance Given that 2 ( n 1)s 2 2 is ~ Χ2 with n-1 degrees of freedom. then, ( n 1)S 2 2 /2,n 1 2 ( n 1)S 2 12 /2,n 1 gives the 100(1-α)% two-sided confidence interval on the variance. 25 ETM 620 - 09U Confidence interval on a proportion The proportion, P, in a binomial experiment may be estimated by X P n where X is the number of successes in n trials. For a sample, the point estimate of the parameter is x p n The mean for the sample proportion is p pq and the sample variance is n p 2 26 p ETM 620 - 09U C.I. for proportions An approximate (1-α)100% confidence interval for p is: p z /2 pq n Large-sample C.I. for p1 – p2 is: ( p1 p2 ) z /2 p1 q1 p2 q2 n1 n2 Interpretation: _______________________________ 27 ETM 620 - 09U Example 10.17 (pg. 240) n = 75 x = 12 pˆ ____________ z0.025= ________ Picture: C.I.: Interpretation: ____________________________________ 28 ETM 620 - 09U Setting the sample size … If the estimate for p from the initial estimate seems pretty reliable, then 2 z n /2 pˆ(1 pˆ ) E e.g., for our example if we want to be 95% confident that the error in our estimate is less than 0.05, then n = __________________ If we’re not at all sure how to estimate p, then assume p = 0.5 and use 2 z n /2 0.25 E 29 ETM 620 - 09U Example: comparing 2 proportions Look at example 10-23, pg. 250 30 1. C.I. = (-0.07, 0.15), therefore no reason to believe there is a significant decrease in the proportion defectives using the new process. 2. What if the interval were (+0.07, 0.15)? 3. What if the interval were (-0.9, -0.7)? ETM 620 - 09U Difference in 2 means, both σ2 known Given two independent random samples, a point estimate the difference between μ1 and μ2 is given by the statistic x1 x2 We can build a confidence interval for μ1 - μ2 (given σ12 and σ22 known) as follows: ( x1 x2 ) z /2 31 12 n1 22 n2 1 2 ( x1 x2 ) z /2 12 n1 22 n2 ETM 620 - 09U An example A farm equipment manufacturer wants to compare the average daily downtime of two sheet-metal stamping machines located in two different factories. Investigation of company records for 100 randomly selected days on each of the two machines gave the following results: ̅x1 = 12 minutes 12 = 12 n1 = n2 = 100 ̅x2 = 10 minutes 22 = 8 Construct a 95% C.I. for μ1 – μ2 32 ETM 620 - 09U Solution α/2 = _____________ Picture z_____ = ____________ ( x1 x2 ) z /2 12 n1 22 n2 1 2 ( x1 x2 ) z /2 12 n1 22 n2 __________________ < μ1 – μ2 < _________________ Interpretation: 33 ETM 620 - 09U Differences in 2 means, σ2 unknown Case 1: σ12 and σ22 unknown but equal ( x1 x 2 ) t /2,n1 n2 2S p 1 1 1 1 1 2 ( x1 x 2 ) t /2,n1 n2 2S p n1 n2 n1 n2 Where, 2 2 ( n 1 ) S ( n 1 ) S 1 2 2 S 2p 1 n1 n2 2 34 ETM 620 - 09U Differences in 2 means, σ2 unknown Case 2: σ12 and σ22 unknown and not equal ( x1 x 2 ) t /2, s12 s22 1 2 ( x1 x 2 ) t /2, n1 n2 Where, 35 s12 s22 n1 n2 (S12 / n1 S22 / n2 )2 2 2 2 2 S1 / n1 S 2 / n2 n1 1 n2 1 ETM 620 - 09U Example, σ2 unknown Suppose the farm equipment manufacturer was unable to gather data for 100 days. Using the data they were able to gather, they would still like to compare the downtime for the two machines. The data they gathered is as follows: x1 = 12 minutes s12 = 12 n1 = 18 x2 = 10 minutes s22 = 8 n2 = 14 Construct a 95% C.I. for μ1 – μ2 assuming: 1. σ12 and σ22 unknown but equal 2. σ12 and σ22 unknown and not equal 36 ETM 620 - 09U Solution: Case 1 x1 x2 _____________ Picture t____ , ________= ____________ S 2p ( n1 1)S12 ( n2 1)S22 _____________________ n1 n2 2 ( x1 x 2 ) t /2,n1 n2 2S p 1 1 1 1 1 2 ( x1 x 2 ) t /2,n1 n2 2S p n1 n2 n1 n2 __________________ < μ1 – μ2 < _________________ Interpretation: 37 ETM 620 - 09U Your turn … Solve Case 2 (assuming variances are not equal) 38 ETM 620 - 09U Paired Observations Suppose we are evaluating observations that are not independent … For example, suppose a teacher wants to compare results of a pretest and posttest administered to the same group of students. Paired-observation or Paired-sample test … Example: murder rates in two consecutive years for several US cities (see attached.) Construct a 90% confidence interval around the difference in consecutive years. 39 ETM 620 - 09U Solution Picture D = ____________ tα/2, n-1 = _____________ 2 (d i d ) sd _________ n 1 sd sd ) D d t /2,n 1 ( ) a (1-α)100% CI for μD is: d t /2,n 1 ( n n __________________ < μ1 – μ2 < _________________ Interpretation: 40 ETM 620 - 09U C. I. for the ratio of two variances If X1 and X2 are independent normal random variables with unknown and unequal means and variances, then the confidence interval on the ratio σ12/ σ22 is given by: S12 12 S12 F 2 2 F /2,n2 1,n1 1 2 1 /2,n2 1,n1 1 S2 2 S2 Note: for F-values not given in table V, recall that F1 /2,n2 1,n1 1 1 F /2,n1 1,n2 1 or use = FINV(probability,degrees_freedom1,degrees_freedom2) 41 ETM 620 - 09U Example 10-22 n1 = 12, s1 = 0.85 n2 = 15, s2 = 0.98 F____ , ____ , ____= ____________ Picture F____ , ____ , ____= ____________ S12 12 S12 F 2 2 F /2,n2 1,n1 1 2 1 /2,n2 1,n1 1 S2 2 S2 __________________ < σ12/ σ22 < _________________ Interpretation: 42 ETM 620 - 09U