Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics: Dealing With Uncertainty ACADs (08-006) Covered 1.1.1.2 1.1.1.4 3.2.3.19 3.2.3.20 Keywords Sample, normal distribution, central tendency, histogram, probability, sample, population, data sorting, standard normal distribution, Z-tables, probability. Supporting Material Statistics Dealing With Uncertainty Objectives • Describe the difference between a sample and a population • Learn to use descriptive statistics (data sorting, central tendency, etc.) • Learn how to prepare and interpret histograms • State what is meant by normal distribution and standard normal distribution. • Use Z-tables to compute probability. Statistics • “There are lies, d#$& lies, and then there’s statistics.” Mark Twain Statistics is... • a standard method for... - collecting, organizing, summarizing, presenting, and analyzing data - drawing conclusions - making decisions based upon the analyses of these data. • used extensively by engineers (e.g., quality control) Populations and Samples • Population - complete set of all of the possible instances of a particular object – e.g., the entire class • Sample - subset of the population – e.g., a team • We use samples to draw conclusions about the parent population. Why use samples? • The population may be large – all people on earth, all stars in the sky. • The population may be dangerous to observe – automobile wrecks, explosions, etc. • The population may be difficult to measure – subatomic particles. • Measurement may destroy sample – bolt strength Exercise: Sample Bias • To three significant figures, estimate the average age of the class based upon your team. • When would a team not be a representative sample of the class? Measures of Central Tendency • If you wish to describe a population (or a sample) with a single number, what do you use? – Mean - the arithmetic average – Mode - most likely (most common) value. – Median - “middle” of the data set. What is the Mean? • The mean is the sum of all data values divided by the number of values. Sample Mean 1 x n n x Where: – xis the sample mean – xi are the data points – n is the sample size i 1 i Population Mean 1 N N x i 1 i Where: – μ is the population mean – xi are the data points – N is the total number of observations in the population What is the Mode? • mode - the value that occurs the most often in discrete data (or data that have been grouped into discrete intervals) – Example, students in this class are most likely to get a grade of B. Mode continued • Example of a grade distribution with mean C, mode B 25 20 15 10 5 0 F D C B A What is the Median? • Median - for sorted data, the median is the middle value (for an odd number of points) or the average of the two middle values (for an even number of points). – useful to characterize data sets with a few extreme values that would distort the mean (e.g., house price,family incomes). What Is the Range? • Range - the difference between the lowest and highest values in the set. – Example, driving time to Houston is 2 hours +/- 15 minutes. Therefore... • Minimum = 105 min • Maximum = 135 minutes • Range = 30 minutes Standard Deviation • Gives a unique and unbiased estimate of the scatter in the data. Standard Deviation • Population • Sample 1 N N 2 ( x ) i Variance = 2 i 1 Deviation n 1 2 s ( x x ) i (n 1) i 1 Variance = s2 The Subtle Difference Between and σ N versus n-1 n-1 is needed to get a better estimate of the population from the sample s. Note: for large n, the difference is trivial. A Valuable Tool • Gauss invented standard deviation circa 1700 to explain the error observed in measured star positions. • Today it is used in everything from quality control to measuring financial risk. Team Exercise • In your team’s bag of M&M candies, count – the number of candies for each color – the total number of candies in the bag • When you are done counting, have a representative from your team enter your data in Excel More Team Exercise (con’t) For each color, and the total number of candies, determine the following: maximum minimum range mean mode median standard deviation variance Individual Exercise: Histograms • Flip a coin EXACTLY ten times. Count the number of heads YOU get. • Report your result to the instructor who will post all the results on the board • Open Excel • Using the data from the entire class, create bar graphs showing the number of classmates who get one head, two heads, three heads, etc. Data Distributions • The “shape” of the data is described by its frequency histogram. • Data that behaves “normally” exhibit a “bellshaped” curve, or the “normal” distribution. • Gauss found that star position errors tended to follow a “normal” distribution. The Normal Distribution • The normal distribution is sometimes called the “Gauss” curve. 1 RF 2 1 2 x / 2 e 2 mean RF Relative Frequency x Standard Normal Distribution Define: Then z x / Area = 1.00 RF 0.5 1 2 z e 2 0.4 0.3 0.2 2 0.1 0.0 -4.0 -3.0 -2.0 -1.0 0.0 z 1.0 2.0 3.0 4.0 Some handy things to know. • 50% of the area lies on each side of the midpoint for any normal curve. • A standard normal distribution (SND) has a total area of 1.00. • “z-Tables” show the area under the standard normal distribution, and can be used to find the area between any two points on the zaxis. Using Z Tables (Appendix C, p. 624) • Question: Find the area between z= -1.0 and z= 2.0 – From table, for z = 1.0, area = 0.3413 – By symmetry, for z = -1.0, area = 0.3413 – From table, for z= 2.0, area = 0.4772 – Total area = 0.3413 + 0.4772 = 0.8185 – “Tails” area = 1.0 - 0.8185 = 0.1815 “Quick and Dirty” Estimates of and • @ (lowest + 4*mode + highest)/6 • For a standard normal curve, 99.7% of the area is contained within ± 3 from the mean. • Define “highest” = + 3 • Define “lowest” = 3 • Therefore, @ (highest - lowest)/6 Example: Drive time to Houston • Lowest = 1 h • Most likely = 2 h • Highest = 4 h (including a flat tire, etc.) – = (1+4*2+4)/6 = 2.16 (2 h 12 min) – = (4 - 1)/6= 0.5 h • This technique (Delphi) was used to plan the moon flights. Review • Central tendency – mean – mode – median • Scatter – range – variance – standard deviation • Normal Distribution