* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Statistics-2
Inductive probability wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
German tank problem wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Central limit theorem wikipedia , lookup
Misuse of statistics wikipedia , lookup
EML 4550: Engineering Design Methods Probability and Statistics in Engineering Design Class Notes Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers and Ye EML4550 2007 1 Sampling Distributions Estimation of population (infinity or very large sample) based on sample (limited ensemble) information Sampling distribution of mean y Note: we are going to specify the population mean as m and the population standard deviation as s Central limit theorem: If y is the mean of a random sample of size n taken from a population (not necessaril y normal) with mean m and finite variance s 2 then y-m the limiting form of the variable : z as n has a standard s/ n normal z - distributi on. (Note : n 30 if the original population is not normal) EML4550 -- 2007 Example An engineering process is producing ball bearings that have an average diameter of 1 cm and a standard deviation of 0.05 cm. Find the probability that a random sample of 64 ball bearings will have an average diameter of less than 0.99 cm. y 0.99 ym 0.99 1 z 1.6 s / n 0.05 / 64 Pr( y 0.99) Pr( z 1.6) 0.0548 With a confidence level of 99%, what do you expect the range of the average of a random sample of 36 ball bearings should be? 99% probabilit y for a normal z - distributi on. Due to symmetry, we should look up for ym z - value of 0.005 in the table z* -2.575 -2.575 z 2.575 s n 0.979 cm y 1.021 cm EML4550 -- 2007 Example What is the probability that an elevator will overload when 12 people (randomly selected) ride at the same time? Given average weight of the population is m=170 lbs, standard deviation is s=30 lbs and the capacity of the elevator is rated at 2400 lbs. 2400 y m 200 170 200, z 3.46 s 30 12 3.46 12 Accorinf to the Central Limit Theorem, it behaviors like z - normal distributi on Pr( y 200 z 3.46) 0.00027 0.027% Very low probabilit y that you will have 12 people with an average weight of 200 lbs to ride an elevator t he same time. y Note : it is supposed to be a random sample from the population , excluding condition such as members from a football team, sumo wrestler training camp, etc.. EML4550 -- 2007 Sampling Distribution of s2 Theorem : If s2 is the variance of a random sample of size n taken from a normal population having the variance s 2 , then the variable 2 ( n 1) s 2 s2 n i 1 ( yi y ) 2 s2 has a chi - squared distributi on with n - 1 degrees of freedom The probability (a) that the sample produces a 2 value greater than specific a2 value. a a2 EML4550 -- 2007 Chi-squared distribution Chi-square table As indicated in the table, it is significant if the 2 is large for given degree of freedom. For example, with a two DOF only 5% probability (1 out of 20 chance) that 2 is greater than 5.99. Therefore, it is likely that certain specifications are not correct resulting a value > 5.99. For values that give probability of greater than 10% probability, one can usually accept the hypothesis (without strong contradiction.) EML4550 -- 2007 Example: check hypothesis A manufacturer company claims that it produces hard drives that have an average life of 10 years with a standard deviation of 1 year. If six of these drives examined have lifetime of 9.8, 10.5, 12, 8.8, 11.5, 10.2 years, can we believe the company’s claim based on these samples? Assume normally distributed population. n y 10.47, s 2 2 ( y y ) i i 1 1.343 n 1 ( n 1) s 2 5(1.343) 2 6.715 2 s 1 Check the chi-square table with 5 degrees of freedom (lost one DOF obtaining the average.) One expects most (90%, between 5 and 95%) of the 2 value will fall between 1.14 and 11.07, therefore, it is likely that the claim by the company is correct (at least one does not have enough evidence to dispute.) EML4550 -- 2007 Example: estimate variance A sample set of steel rods are measured to have the following lengths: 46.4, 45.8, 46.1, 46.9, 45.7, 46.0, 45.9 cm. Assuming the entire stock has a normal distribution, estimate the mean and variance of the entire stock given the confidence level of 90%. 1 7 y yi 46.11 (estimate m on your own using Student t - distributi on) , n i 1 1 7 s ( yi y ) 2 0.171 n 1 i 1 To have a 90% confidence level, we would like to have a probabilit y of 90% that a sample will fall within given range. Due to symmetry, we expect tha t 5% will fall below and 5% will exceed the specified range. 2 2 2 From the chi - square table, for a six DOF : 0.05 12.59, 0.95 1.63 Therefore, at 90% confidence level for s 2 (n 1) s 2 2 1.63 0.05 12.59 2 s Inverse the equation and times everything by (n - 1)s 2 (6)(0.171) (6)(0.171) s 2 0.629 s 2 0.081 1.63 12.59 2 0.95 EML4550 -- 2007 Student t-distribution Central limit theorem is good given that the standard deviation s of a population is known. However, knowledge about s usually is not available. Therefore, an estimation of s must be used for the estimation of the sample average m. A new variable is defined to handle this situation T ym s/ n If the sample size is large, one would expect that s ~ s and the T-variable follows closely to a standard normal distribution as defined in the central limit theorem. The t-test is important when the sample size is small (n<30). It is interestin g to explore the relationsh ip between a z - normal variable and the T - variable : y-μ s n y-μ z s V/(n - 1) s2 s 2 n (n - 1)s 2 where V has chi - squared distributi on as defined earlier 2 T s EML4550 -- 2007 t-distribution EML4550 -- 2007 Example: estimation based on small sample size The manufacturer collect 8 samples of the years of failure for a newly designed advanced engine; they are: 15.4, 14.7, 18.1, 16.5, 17.2, 13.5, 15.8, 18.0. It is believed that the lifetimes of all engines are normally distributed with an unknown standard deviation. Therefore, we can not apply central limit theorem to estimate the average lifetime of this engine. Instead, t-distribution is used instead. For a 90% confidence level, estimate the average lifetime of the given engine. Choose one-tail 0.05 or two-tails 0.10 (1-0.9=0.1) as the starting point. The DOF is 8-1=7 That is, with a 90% probabilit y that T variables fall within this range ym my 1.895 T 1.895 1.895 1.895 s/ n s/ n 1.895 s / n y m y 1.895 s / n ; y 16.15, s 1.62 17.23 m 15.07 or m 16.15 1.08 (year) with 90% confidence level EML4550 -- 2007