Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor Kenneth R. Martin Lecture 7 October 12, 2016 Confidential - Kenneth R. Martin Agenda • Housekeeping – Readings – Exam #1 review • Chapter 1, 14, 10, 2, & 3 Confidential - Kenneth R. Martin Housekeeping • • • • • Read, Chapter 1.1 – 1.4 Read, Chapter 14.1 – 14.2 Read, Chapter 10.1 Read, Chapter 2 Read, Chapter 3 Confidential - Kenneth R. Martin Housekeeping • Exam #1 Review Confidential - Kenneth R. Martin Statistics – Application to Research Confidential - Kenneth R. Martin Statistics • Why collect samples ? Population and Sample POPULATION SAMPLE Sampling Scheme Measure Use data from the SAMPLE to make conclusions about the POPULATION Data! Often impractical to collect all the data from the entire population (i.e. U.S. census). Some test methods are destructive – we wouldn’t have any products or services left to ship to a customer! Too expensive to sample the entire population. Don’t have to collect 100% of the population ! We can use inferential statistics to make sound conclusions about the population. Confidential - Kenneth R. Martin Statistics Describing the Data • Two methods to summarize the data: – – Graphical - Histogram Analytical - Central Tendency Confidential - Kenneth R. Martin Statistics Central Tendency • A statistical measure which describes how the data is distributed around its central value: which includes the Mean, Median, and Mode. – However, Central Tendency does not tell about data Variation / spread. Confidential - Kenneth R. Martin Statistics Relationship of Central Tendency *** Normal distribution: Mean = Median = Mode Confidential - Kenneth R. Martin Statistics Frequency Distributions Confidential - Kenneth R. Martin Statistics Various curves (Different data spreads, common means) Confidential - Kenneth R. Martin Statistics Various curves (Different means, common data spreads) Confidential - Kenneth R. Martin Statistics Various Normal Curves Confidential - Kenneth R. Martin Statistics Measures of Variability - how the data is spread from it’s central value • The central tendency does not indicate any levels of variability (dispersion) from the mean. A = {100, 200, 300, 400, 500} B = {50, 150, 300, 450, 550} C = {250, 300, 300, 300, 350} • The mean & median of this data are all the same, but the variability of data is different in all data sets. Confidential - Kenneth R. Martin Statistics Measures of Variability: Can be values from 0 to ∞ (infinity) – • • 0 means no variability of data A large value indicates lots of variability of data – Values can never be negative – As soon as one value in a data set differs from another, variability exists Confidential - Kenneth R. Martin Statistics Measures of Variability (Dispersion) - Range Range (R) = Max. value – Min. value =XH–XL As data set size , the accuracy of using range . Limit the usage of Range to ~ 10 readings. Confidential - Kenneth R. Martin Statistics Measures of Variability – Range Example A = {100, 200, 300, 400, 500} B = {50, 150, 300, 450, 550} C = {250, 300, 300, 300, 350} RA = ? RB = ? RC = ? Confidential - Kenneth R. Martin Statistics Measures of Variability • So what is the limitation of all three of these Range calculations ? Confidential - Kenneth R. Martin Statistics Measures of Variability (Dispersion) - Variance • Variance: a measure of the variability of the average squared distance that data points deviate from their mean. • Variance calculations include all data points. Confidential - Kenneth R. Martin Statistics Measures of Variability (Dispersion) - Variance • Sum of Squares (SS): the sum (addition) of the squared deviations of values from their mean. The SS is the numerator of the variance formula. Variance, 2 , for the Population. μ is the population average Variance, S2, for a Sample. M is the sample average. Confidential - Kenneth R. Martin Statistics Variance - Example A = {100, 200, 300, 400, 500} • In this case, notice that the SS of both the population and the sample will be the same • Remember: PREMDAS • What is 2 ? • What is S2 ? Confidential - Kenneth R. Martin Statistics Measures of Variability (Dispersion) - Variance • What is a big limitation with Variance ? – What do you notice about the units of the mean, and the units of Variance ? Confidential - Kenneth R. Martin Statistics Measures of Dispersion – Standard Deviation • Also called the Root Mean Square deviation, it is a measure of the spread of the variability of the data; the average distance data deviate from their mean. • Calculated by taking the square root of the Variance Confidential - Kenneth R. Martin Statistics Measures of Dispersion – Standard Deviation • When the data comes from the “population”, we shall use “” (sigma) to denote the Standard Deviation. • • • The mean value will be represented by the Greek symbol (mu) The denominator does not have “uncertainty”, thus N When the data comes from a “sample”, we shall use “SD” to denote the Standard Deviation. • • The mean value will be represented by M or X ( X-bar) The denominator shows “uncertainty”, thus n-1 Confidential - Kenneth R. Martin Statistics Measures of Dispersion – Standard Deviation • We typically always want the standard deviation (variance) value to be as small as possible. – We typically want to minimize variability ! Standard deviation is always a better measure to precisely describe the data distribution versus range. • Other formulas exist for Standard Deviation, but will not be covered. Confidential - Kenneth R. Martin Statistics Standard Deviation - Example A = {100, 200, 300, 400, 500} • What do we notice about the units of Standard Deviation and the units of the mean ? • The Mean and Standard Deviation are typically reported together. Confidential - Kenneth R. Martin Statistics Standard Deviation - Example B = {50, 150, 300, 450, 550} Confidential - Kenneth R. Martin Statistics Measures of Dispersion – Coefficient of Variation • CVar – Allows a comparison of standard deviations when the units of measure are not the same Confidential - Kenneth R. Martin Statistics Coefficient of Variation - Example Confidential - Kenneth R. Martin Statistics Box and Whisker Plot – Boxplot • • Simple graphical tool to summarize data. Need to determine 5 values (five-number summary) from data, to generate a boxplot: 1. 2. 3. 4. 5. Median (2nd Quartile) Maximum data value Minimum data value 1st Quartile (values below 1/4 observations)[whisker end] 3rd Quartile (values below 3/4 observations)[whisker end] Confidential - Kenneth R. Martin Statistics Box and Whisker Plot – Boxplot Example • Process aim = 9.0 minutes • Spec = + / - 1.5 minutes • n = 125 • R = 1.7 Confidential - Kenneth R. Martin Statistics Box and Whisker Plot - Boxplot Example • • Inside box is the median value, and approximately 50% of observations Whiskers extend from the box to extreme values • Example: 1. 2. 3. 4. 5. Median; n=125: Median = 63rd value = 9.8 Max = 10.7 Min = 9.0 1st Quartile = X 125 * 0.25 ~ X Avg 31 & 32 value = 9.6 3rd Quartile = X 125 * 0.75 ~ X Avg 94 & 95 value = 10.0 Confidential - Kenneth R. Martin Statistics Box and Whisker Plot - Boxplot Example 9.0 Q1 9.6 9.8 Q2 Q3 10.7 10.0 • Long Whiskers denote the existence of values much larger than other values. • • For this example, mean median. Other variants exist, i.e. + / - 1.5*IQR [whisker ends], all other points are “outliers” as depicted as asterisks • IQR = Inner Quartile Range Confidential - Kenneth R. Martin Statistics Box and Whisker Plot - Boxplot Example Confidential - Kenneth R. Martin Statistics Measures of Variability (Dispersion) - IQR • IQR – Interquartile Range IQR = Q3 – Q1 Confidential - Kenneth R. Martin Statistics Box and Whisker Plot - Boxplot Example 9.0 Q1 9.6 • 9.8 Q2 Q3 10.0 For this example, IQR = ? Confidential - Kenneth R. Martin 10.7