* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 4
Survey
Document related concepts
Transcript
Chapter 4 Statistics Statistics “There are three kinds of lies: lies, damn lies and statistics” Benjamin Disraeli “only if the statistics are used improperly” Tiny Grant, 1978 Statistics Is really just about being able to tell differences in analyses, methods when error or regular variation is taken into account. Error can be in the sample. Differences in a process or in a sample Or in the data collection. (Random error) Lets Look at a Difference in a Manufacturing Process Variation in the life of light bulbs. What can cause differences. Let’s carry out an thought experiment. Light Bulb Lifetime Take a large set of light bulbs and measure the life times. Our variation in life will be based on differences in the bulb. We can measure time to much greater accuracy than needed here. We will look at an example data set - Let’s describe this data numerically Taking this data we can calculate: A mean A standard deviation This data will usually present as a normal distribution. x = 845.2 hrs s = 94.2 hrs Mean x x i i n Standard Deviation s ( x x ) i i n 1 2 Standard Deviation (Alternate formula) What is gained by this formula?? s n x ( xi ) 2 i n(n 1) 2 Variance The variance is the standard deviation squared. For this data set it would be 8873.64 hr2 Why bother?? When examining error the variance of all the steps in a process are additive. 2 total s s 2 i Why then, would we not use this? The units on standard deviation are the same as our unit of measure and make physical sense. The greater the variability of the data set the larger the standard deviation will be. This Normal Curve can tell us a lot. The area under the curve is normalized. That is - made to equal 1. Now the fraction of the area can represent a probability. For example half the sample will have a life less than the mean. (Area = 0.500) These areas are available in charts. Math expression for Normal Curve 1 ( x m ) 2 / 2s 2 y e s 2 Where x is the x axis position. s is the population standard deviation m is the population average Express the x position in terms of mean and standard deviation and it is denoted - z z xm s xx s How much area falls outside + 3s ? 1- area(left)-area(right) Area when z=3 is 0.449865 So 1-0.49864 - 0.49865 = 0.0027 Or in our example only 0.3% light bulbs last more or less than x + 3s For our mean and standard deviations this that 99.7% of these bulbs would last from 563 hours to 1127 hours How much area between ranges m + 1s m + 2s m + 3s 68.3% 95.5% 99.7% Applications that we will use. If we were to change our light bulb manufacturing process in some way then how could be tell if it improved the bulbs manufactured. Perhaps the goal is shorten life, you can sell more that way! How do we tell if there a difference? Since we know there is a spread of bulb lifetimes then we can expect that we would need to do more that just check one bulb. We would repeat the entire experiment. What statistical tools are need for this? Compare two means 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 Compare two means (2) 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 3 We have a problem. We often do not know the population mean and standard deviation. m and s This requires that we do many analyses. We work with the sample mean xbar and the sample deviation s We use the Student’s t Statistic Work W. S. Gosset published under the assumed name of Student in 1908 in Biometrica For a biography of this person http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/Gosset.html The Confidence Interval This allows us to express the results in a quantitative way ts m x n Tools for the first lab. Case 1 - Is my data consistent with the known or required analysis value. Case 2 – Are two sets of data the same Case 3 – Are two methods equivalent, limited sample. Case 1 tcalculated xm s n If tcalculated is larger than the table value then the value is not in bounds Case 2 How do we determine if these are different. tcalc x1 x2 s pooled n1n2 n1 + n2 Spooled (x x ) + (x 2 s pooled i 1 n1 + n2 2 j x2 ) 2 Don’t worry about 4-8a and 4-9a For when the standard deviations are not equal. F test is used to sort this out. Case 3 Calculate average deviation (keep signs) d d i n Calculate t tcalc d sd n Calculate sd sd (d Then compare t i d) n 1 calc to t table 2 Q Test Arrange data in numerical order Calculate Qcalc Qcalc = gap/range If Qcalc > QTable then reject value. Used for small data sets.