Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 2 Basic Experimental Statistics What does this mean? Chemical Response 250 200 150 100 50 0 0 5 10 15 20 Run From Snee, R.D. (1983) “Graphical Analysis of Process Variation Studies,” Journal of Quality Technology, 15, 76-88 Are the specimen different? 250 Chemical Response Sp #5 Sp #6 200 Sp #1 Sp #2 150 100 Sp #3 50 0 0 5 10 Run 15 20 Are the operators different? Chemical Response 250 Operator #3 200 Operator #1 150 100 Operator #2 50 0 0 5 10 Run 15 20 Statistics in Experimentation • Project Planning Phase – What is to be measured? – How large is the likely variation? – What are the influential factors? • Experimental Design Phase – Control known sources of variation – Estimate the size of uncontrolled variation – Investigate suitable models • Statistical Analysis Phase – Next few weeks… Population • All possible items or units that determine an outcome of a well-defined experiment are collectively called a “population”. • Examples: – All 30-ohm resistors produced by a certain manufacturer during a fixed time period. – All measurements of the fracture strength of oneinch-thick underwater welds on a steel alloy base plate that is 200ft deep in salt-water. Process • A repeatable series of actions that result in an observable characteristic or measurement. • Examples: – The effect of aspirin on blood pressure. Sample • A group of observations taken from a population or a process. • We usually take a “convenience sample” (i.e., easy to obtain), but these can be of dubious value because they may not be representative of the variation in the population!! • Goal: random sample Other definitions • Variable : A property or characteristic on which information is obtained in an experiment. • Distribution: a tabular or graphical description of the values of a variable using some measure of how frequently they occur in a population, process, or sample. (think: histogram) Example 9 Frequency (# students) 8 7 6 5 4 3 2 1 0 15 30 45 60 75 90 105 Total Time per Problem (min.) Population or a Sample? 120 More Example 9 Frequency (# students) 8 7 6 5 4 3 2 1 0 15 30 45 60 75 90 105 120 More Total Time per Problem (min.) Convenience Sample or Random Sample? Example 9 Frequency (# students) 8 7 6 5 4 3 2 1 0 15 30 45 60 75 90 105 Total Time per Problem (min.) Is this a Normal Distribution? 120 More Beware the Normal Distribution • aka, Gaussian Distribution • Approximately normal distributions occur in many situations, as explained by the central limit theorem. When there is reason to suspect the presence of a large number of small effects acting additively and independently, it is reasonable to assume that observations will be normal. • Good example: laser light intensity Beware the Normal Distribution • Bad example: Black–Scholes model • Changes in the logarithm of exchange rates, price indices, and stock market indices; these variables behave like compound interest, not like simple interest, and so are multiplicative; • While the Black–Scholes model assumes normality, in reality these variables exhibit heavy tails, as seen in stock market crashes (see The Black Swan) Beware the Normal Distribution • Concrete example: S&P 500, daily change: m = +0.035% ± 0.9% (from W. Egan) • Based on this, you conclude: – 68% of the time, changes will be less than 0.9% – 95% of the time, changes will be less than 1.8% – 99.7% of the time, changes will be less than 2.7% – 99.994% of the time, changes will be less than 3.6% • (1 in every 17,000 trading days or once every 70 years) – A 4.5% change should almost never happen, but it has happened multiple times in your lives! Sample Mean or Average x1 x2 ... xn x n xi n 1 • Sample median: – M=x(q) if n is odd, where q=(n+1)/2. – M=[x(q) + x(q+1)]/2 if n is even where q=n/2 • Do not use text for equations (like I just did) in reports! It looks like crap! Sample Standard Deviation xi x s 2 n 1 2 • In your reports, all values should be reported as: x s or m • INCLUDE UNITS!!!! Sample Standard Error sd se n • Some people like to use standard errors because the smaller values seem to imply better results. You need to be careful with such people! Std. Err. vs. Std. Dev. Thanks to Wikipedia…. • standard error is an estimate of how close to the population mean your sample mean is likely to be – it behaves like a confidence interval. • standard deviation is the degree to which individuals within the sample differ from the sample mean (“Spread”). • Standard error should decrease with larger sample sizes, as the estimate of the population mean improves. Standard deviation will be unaffected by sample size. Percent Error your value true value % error 100% true value • Note: The order of terms in the numerator can vary. • Percent Difference your value other value % diff 100% better value • Note: There is no universally accepted choice for the value that should go in the denominator. • If one value is believed to be closer to the true value, that should be in the denominator. Which to use? • We almost always need to give an indication of how accurate our experimental measurements are. Which measure should we use? • When should we use ? • When should we use % error or % diff?