* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download sampling - Lyle School of Engineering
Survey
Document related concepts
Transcript
Systems Engineering Program Department of Engineering Management, Information and Systems EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS Sampling and Sampling Distributions Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering 1 Population vs. Sample Population the total of all possible values (measurement, counts, etc.) of a particular characteristic for a specific group of objects. Sample a part of a population selected according to some rule or plan. Why sample? - Population does not exist - Sampling and testing is destructive 2 Sampling Characteristics that distinguish one type of sample from another: • the manner in which the sample was obtained • the purpose for which the sample was obtained 3 Types of Samples • Simple Random Sample The sample X1, X2, ... ,Xn is a random sample if X1, X2, ... , Xn are independent and identically distributed random variables. Remark: Each value in the population has an equal and independent chance of being included in the sample. •Stratified Random Sample The population is first subdivided into sub-populations for strata, and a simple random sample is drawn from each strata 4 Types of Samples (continued) Censored Samples • Type I Censoring - Sample is terminated at a fixed time, t0. The sample consists of K times to failure plus the information that n-k items survived the fixed time of truncation. • Type II Censoring - Sampling is terminated upon the Kth failure. The sample consists of K times to failure, plus information that n-k items survived the random time of truncation, tk. • Progressive Censoring - Sampling is reduced in stage. 5 Types of Samples (continued) • Systematic Random Sample The N items in the population are arranged in some order. Select an item at random from the first K = N/n items, where n is the sample size. Select every Kth item thereafter. 6 Sampling Monte Carlo Simulation 7 Uniform Probability Integral Transformation For any random variable Y with probability density function f(y), the variable y F ( y) f ( x)dx is uniformly distributed over (0, 1), or F(y) has the probability density function gF ( y) 1 for 0 y 1 8 Uniform Probability Integral Transformation Remark: the cumulative probability distribution function for any continuous random variable is uniformly distributed over the interval (0, 1). 9 Generating Random Numbers f(y) y F(y) ri 1.0 0.8 0.6 0.4 0.2 0 y yi 10 Generating Random Numbers Generating values of a random variable using the probability integral transformation to generate a random value y from a given probability density function f(y): 1. Generate a random value rU from a uniform distribution over (0, 1). 2. Set rU = F(y) 3. Solve the resulting expression for y. 11 Generating Random Numbers with Excel From the Tools menu, look for Data Analysis. 12 Generating Random Numbers with Excel If it is not there, you must install it. 13 Generating Random Numbers with Excel Once you select Data Analysis, the following window will appear. Scroll down to “Random Number Generation” and select it, then press “OK” 14 Generating Random Numbers with Excel Choose which distribution you would like. Use uniform for an exponential or weibull distribution or normal for a normal or lognormal distribution 15 Generating Random Numbers with Excel Uniform Distribution, U(0, 1). Select “Uniform” under the “Distribution” menu. Type in “1” for number of variables and 10 for number of random numbers. Then press OK. 10 random numbers of uniform distribution will now appear on a new chart. 16 Generating Random Numbers with Excel Normal Distribution, N(μ, σ). Select “Normal” under the “Distribution” menu. Type in “1” for number of variables and 10 for number of random numbers. Enter the values for the mean (m) and standard deviation (s) then press OK. 10 random numbers of uniform distribution will now appear on a new chart. 17 Generating Random Values from an Exponential Distribution E() with Excel First generate n random variables, r1, r2, …, rn, from U(0, 1). Select “Uniform” under the “Distribution” menu. Type in “1” for number of variables and 10 for number of random numbers. Then press OK. 10 random numbers of uniform distribution will now appear on a new chart. 18 Generating Random Values from an Exponential Distribution E() with Excel Select a θ that you would like to use, we will use θ = 5. Type in the equation xi= -ln(1 - ri), with filling in θ as 5, and ri as cell A1 (=-5*LN(1-A1)). Now with that cell selected, place the cursor over the bottom right hand corner of the cell. A cross will appear, drag this cross down to B10. This will transfer that equation to the cells below. Now we have n random values from the exponential distribution with parameter θ=5 in cells B1 - B10. 19 Generating Random Values from an Weibull Distribution W(β, ) with Excel First generate n random variables, r1, r2, …, rn, from U(0, 1). Select “Uniform” under the “Distribution” menu. Type in “1” for number of variables and 10 for number of random numbers. Then press OK. 10 random numbers of uniform distribution will now appear on a new chart. 20 Generating Random Values from an Weibull Distribution W(β, ) with Excel Select a β and θ that you would like to use, we will use β =20, θ = 100. Type in the equation xi = [-ln(1 - ri)]1/, with filling in β as 20, θ as 100, and ri as cell A1 (=100*(-LN(1-A1))^(1/20)). Now transfer that equation to the cells below. Now we have n random variables from the Weibull distribution with parameters β =20 and θ =100 in cells B1 - B10. 21 Generating Random Values from an Lognormal Distribution LN(μ, σ) with Excel First generate n random variables, r1, r2, …, rn, from N(0, 1). Select “Normal” under the “Distribution” menu. Type in “1” for number of variables and 10 for number of random numbers. Enter 0 for the mean and 1 for standard deviation then press OK. 10 random numbers of uniform distribution will now appear on a new chart. 22 Generating Random Values from an Lognormal Distribution LN(μ, σ) with Excel Select a μ and s that you would like to use, we will use μ = 2, σ = 1. ri x e Type in the equation , i with filling in μ as 2, σ as 1, and ri as cell A1 (=EXP(2+A1*1)). Now transfer that equation to the cells below. Now we have an Lognormal distribution in cells B1 - B10. 23 Flow Chart of Monte Carlo Simulation method Input 1: Statistical distribution for each component variable. Select a random value from each of these distributions Input 2: Relationship between component variables and system performance Calculate the value of system performance for a system composed of components with the values obtained in the previous step. Repeat n times Output: Summarize and plot resulting values of system performance. This provides an approximation of the distribution of system performance. 24 Sample and Size Error Bands Because Monte Carlo simulation involves randomly selected values, the results are subject to statistical fluctuations. • Any estimate will not be exact but will have an associated error band. • The larger the number of trials in the simulation, the more precise the final results. • We can obtain as small an error as is desired by conducting sufficient trials • In practice, the allowable error is generally specified, and this information is used to determine the required trials 25 Example If X~ B(n,p) and the desired confidence level is 95%, then 1 - = 0.95 and = 0.05 and Z1-/2 = 1.96; and if P ' = 0.2. Then an estimate of the required sample size is 0.20.8 2 1.96 246 n 2 0.05 26 Drawbacks of the Monte Carlo Simulation • there is frequently no way of determining whether any of the variables are dominant or more important than others without making repeated simulations • if a change is made in one variable, the entire simulation must be redone • the method may require developing a complex computer program • if a large number of trials are required, a great deal of computer time may be needed to obtain the necessary results 27 Example If the probability density function of X is f (x) 2(1 x) 0 for 0 x 1 elsewhere Find (a) F(x) (b) Mean (c) Standard Deviation (d) The value of x for which P(X > x)=0.05 (e) If 5 values of x are randomly selected find the probability that at least 2 of them will exceed 0.6 (f) Redo parts (a) thru (e) using Monte Carlo Simulation 28 Example - Solution First, plot f (x ) : 2 f(x) 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 x 29 Example - Solution (a) The (cumulative) probability distribution function of X for 0 x 1 is F( x) P( X x ) x f ( x)dx x 2(1 y ) dx 0 x x 0 0 2 dy 2 ( y ) dy 2 y x 0 x y 2 2 0 2x x2 2 30 Example - Solution so that F (x) 0 2x x2 1 for x 0 for 0 x 1 for x 1 31 Example - Solution (b) The mean of X is 1 E (X ) x 2(1 x)dx 0 1 2 [ x x 2 ]dx 0 1 x x 2 2 3 0 1 1 2 2 3 1 3 2 3 32 Example - Solution The variance of X is 1 2 1 1 9 1 2 2 2 2 Var( X ) E( X ) x f ( x)dx 3 0 x 2 2(1 x)dx 0 1 x x 1 2 3 4 0 9 2 1 1 1 1 2 2 3 4 3 12 9 1 18 3 4 33 Example - Solution The standard deviation is 1 VAR ( X ) 18 1 18 0.236 34 Example - Solution (d) The value of x such that P(X > x) = 0.05 can be determined by a couple of different approaches. x can be obtained by solving the following equation for x, 1 P( X x) f ( y )dy 0.05 x or by solving F(x) = 0.95 for x, 2 0 . 95 2 x x F (X ) 35 Example - Solution 2 x 2 x 0.95 0 Here x 1.2236 or x 0.7764 its roots are f(x) 1.2236 is outside of our range, so x 0.7764 is our answer. If we check with our plot of the data, this seems reasonable. 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0.05 0 0.2 0.4 0.6 x 0.8 1 0.7764 36 Example - Solution (e) Let Y = number of values that exceed 0.6, for y = 0,1,2,3,4,5. P( X 0.6) 1 P( X 0.6) 1 F (0.6) 1 2(0.6) 0.6 1 0.84 2 0.16 Now Y ~ B5,0.16 37 Example - Solution so that 1 P(Y 2) 1 bx;5,0.16 y 0 5 y 5 y 1 0.16 1 0.16 y 0 y 1 0.4182 0.3983 1 0.1835 38 Example - Solution (f) Generate a random sample of n, say 1,000, from f (x) using Monte Carlo Simulation as follows: Since F ( x) 2 x x 2 for 0 x 1, generate ri from U0,1 and solve for xi 2 xi xi ri for i 1,...,1000 2 39 Example - Solution Then estimate F(x), μ, σ and PY 2 as follows: 10 fi ˆ F ( x) for 0 x 1 k 1 1000 f(x) 0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-1.0 0.0-1.0 Frequency, f i 196 170 136 119 103 96 78 56 35 11 1000 fi 1000 0.196 0.17 0.136 0.119 0.103 0.096 0.078 0.056 0.035 0.011 fi 1000 0.196 0.366 0.502 0.621 0.724 0.82 0.898 0.954 0.989 1 1 0.2 relative frequency Interval 0.15 0.1 0.05 0 0- 0.1- 0.2- 0.3- 0.4- 0.5- 0.6- 0.7- 0.8- 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x 40 Example - Solution Then estimate F(x), μ, σ and PY 2as follows: 10 fi ˆ F ( x) for 0 x 1 k 1 1000 F(x) 1 Fˆ ( x) 0.8 0.6 0.4 0.2 0 0- 0.1- 0.2- 0.3- 0.4- 0.5- 0.6- 0.7- 0.8- 0.90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x 41 Example - Solution 1 1000 μ̂ x1 1000 i 1 1 340.79 1000 0.34079 1 Compare this to = 3 42 Example - Solution σ̂ S where S 2 n 1 n n x xi 2 i 2 nn 1 1000175.93 116139.34 1000999 0.0599 S S2 0.0599 0.2446 43 Example - Solution σ̂ S n 1 n 999 0.2446 1000 0.2445 Compare this to = 0.236 . 44 Example - Solution p̂ P̂X 0.6 no. of values of x 0.6 total number of values 180 1000 0.18 Compare this to the p 0.16 45 Example - Solution no. the groups of 5 that have 2 values of x 0.6 ˆ P(Y 2) total number of groups 29 11 0 0 200 0.20 Compare this to the P 0.1835 46 Example - Solution - Our Data ri xi 0.38200 0.10068 0.59648 0.89911 0.88461 0.95846 0.01450 0.40742 0.86325 0.13858 0.24503 0.04547 0.03238 0.16413 0.21961 0.01709 0.28504 0.34309 0.55364 0.35737 0.37184 0.35560 0.91031 0.46602 0.42616 0.30390 0.97571 0.80667 0.21387 0.05168 0.36477 0.68236 0.66031 0.79620 0.00727 0.23021 0.63020 0.07188 0.13111 0.02300 0.01632 0.08574 0.11660 0.00858 0.15445 0.18950 0.33190 0.19836 0.20743 0.19726 0.70051 0.26926 0.24248 0.16568 0.84414 0.56030 num in group >0.6 >0.6 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 2 2 Remember, that there are 1000 points of data that we have used. To access our data, just double click on the excel chart to the left. 0 0 1 47 Sampling Distributions 48 Sampling Distribution of X with known If X1, X2, ... ,Xn is a random sample of size n from a normal distribution with mean and known standard deviation , and if 1 n X Xi n i 1 then σ X ~ N μ, n and X μ Z ~ N0,1 σ n 49 Sampling Distributions: Example The dollar amount per transaction, X, in the Sporting Goods Department of a store has a normal distribution with mean $75 and standard deviation of $20. What is the probability that a random sample of 9 sales transactions will have an average over $85? 50 Sampling Distributions: Example - Solution If X ~ N(75, 20), then 20 X ~ N 75, 9 X μ 85 75 P X 85 P 20 σ n 9 PZ 1.5 0.0668 51 Central Limit Theorem If X is the mean of a random sample of size n, X1, X2, …, Xn, from a population with mean and finite standard deviation , then if n the limiting distribution of Z X n is the standard normal distribution. 52 Central Limit Theorem Remark: The Central Limit Theorem provides the basis for approximating the distribution of X with a normal distribution with mean and standard deviation n The approximation gets better as n gets larger. 53 Central Limit Theorem - Example A manufacturing process produces parts with a mean diameter of 5 mm. An engineer conjectures that the population mean is 5.0 mm, and an experiment is conducted in which 100 parts are selected randomly and measured. It is known that the population = 0.1. The experiment indicates a sample average diameter X = 5.027 mm. Does this refute the engineer’s conjecture? Solution: Whether or not the data support or refute the conjecture depends on the probability that data similar to that obtained in this experiment can readily occur when = 5.0. In other words, how likely is it that one can obtain X 5.027 with n = 100 if the mean is equal to = 5.0? 54 Solution The probability that we choose to compute is given by P[( X - 5) 0.027]. This is the same as asking, if the mean is 5, what is the chance that it will deviate by so much as 0.027? P[( X 5) 0.027] P[( X 5) 0.027] X 5 P[( X 5) 0.027] P 2.7 0.1 / 100 55 Solution (Continued) Here we are simply standardizing the sample mean according to the Central Limit Theorem. X 5 P 2.7 P[ Z 2.7] 0.1 / 100 0.0035 Thus one would experience by chance a sample mean that is 0.027 mm from the population mean in only about 3.5 of 1000 experiments. Therefore the sample data does not support the engineer’s conjecture. 56 Sampling Distribution of X with Unknown Let X1, X2, ..., Xn be independent random variables that have normal distribution with mean and unknown standard deviation . Let n 1 X Xi n i 1 and n 2 1 2 S Xi X n 1 i 1 Then the random variable X μ T S n has a t-distribution with = n - 1 degrees of freedom. 57 Sampling Distributions of S2 If S2 is the variance of a random sample of size n taken from a normal population having variance 2, then the statistic 2 n 1 s 2 2 n i 1 X i X 2 2 has a chi-squared distribution with = n - 1 degrees of freedom. 58 Example A manufacturer of car batteries guarantees that his product will last, on average, 3 years with a standard deviation of 1 year. If five batteries have lifetimes of 1.9, 2.4, 3.0, 3.5 and 4.2 years, is the manufacturer still convinced that his batteries have a standard deviation of 1 year? Assume that battery lifetime follows normal distribution . Solution: We first find the sample variance: 548.26 15 54 2 s 2 0.815 59 Solution (Continued) Then 2 40.815 3.26 1 is a value from a chi-squared distribution with 4 degrees of freedom. Since 95% of the 2 values with 4 degrees of freedom fall between 0.484 and 11.143, the computed value with 2 = 1 is reasonable, and therefore the manufacturer has no reason to suspect that the standard deviation is other than 1 year. 60