Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Summary Statistics Statistics • The collection, evaluation, and interpretation of data • Statistical analysis of measurements can help verify the quality of a design or process Summary Statistics Central Tendency • “Center” of a distribution – Mean, median, mode Variation • Spread of values around the center – Range, standard deviation, interquartile range Distribution • Summary of the frequency of values – Frequency tables, histograms, normal distribution Mean Central Tendency • The mean is the sum of the values of a set of data divided by the number of values in that data set. xi μ= N Mean Central Tendency xi μ= N μ = mean value xi = individual data value xi = summation of all data values N = # of data values in the data set Mean Central Tendency • Data Set 3 7 12 17 21 21 23 27 32 36 44 • Sum of the values = 243 • Number of values = 11 xi Mean = μ = N 243 = = 22.09 11 Mode Central Tendency • Measure of central tendency • The most frequently occurring value in a set of data is the mode • Symbol is M Data Set: 27 17 12 7 21 44 23 3 36 32 21 Mode Central Tendency • The most frequently occurring value in a set of data is the mode Data Set: 3 7 12 17 21 21 23 27 32 36 44 Mode = M = 21 Mode Central Tendency • The most frequently occurring value in a set of data is the mode. • Bimodal Data Set: Two numbers of equal frequency stand out • Multimodal Data Set: If more than two numbers of equal frequency stand out Mode Central Tendency Determine the mode of 48, 63, 62, 49, 58, 2, 63, 5, 60, 59, 55 Mode = 63 Determine the mode of 48, 63, 62, 59, 58, 2, 63, 5, 60, 59, 55 Mode = 63 & 59 Bimodal Determine the mode of 48, 63, 62, 59, 48, 2, 63, 5, 60, 59, 55 Mode = 63, 59, & 48 Multimodal Median Central Tendency • Measure of central tendency • The median is the value that occurs in the middle of a set of data that has been arranged in numerical order • Symbol is ~x, pronounced “x-tilde” Median Central Tendency • The median is the value that occurs in the middle of a set of data that has been arranged in numerical order. Data Set: 27 17 12 7 21 44 23 3 36 32 21 Median Central Tendency • A data set that contains an odd number of values always has a Median. Data Set: 3 7 12 17 21 21 23 27 32 36 44 Median Central Tendency • For a data set that contains an even number of values, the two middle values are averaged with the result being the Median. Data Set: 3 7 12 17 21 21 23 27 31 32 36 44 Range Variation • Measure of data variation. • The range is the difference between the largest and smallest values that occur in a set of data. • Symbol is R Data Set: 3 7 12 17 21 21 23 27 32 36 44 Range = R = 44 – 3 = 41 Standard Deviation Variation • Measure of data variation. • The standard deviation is a measure of the spread of data values. – A larger standard deviation indicates a wider spread in data values Standard Deviation σ= Variation xi − μ N 2 σ = standard deviation xi = individual data value ( x1, x2, x3, …) μ = mean N = size of population Standard Deviation Variation 2 Procedure: xi − μ σ= N 1. Calculate the mean, μ. 2. Subtract the mean from each value and then square each difference. 3. Sum all squared differences. 4. Divide the summation by the size of the population (number of data values), N. 5. Calculate the square root of the result. Standard Deviation Calculate the standard deviation for the data array σ= xi − μ N 2 2, 5, 48, 49, 55, 58, 59, 60, 62, 63, 63 xi 524 1. Calculate the mean. 47.64 μ= 11 N 2. Subtract the mean from each data value and square each 2 difference. xi − μ (2 - 47.64)2 = 2083.01 (5 - 47.64)2 = 1818.17 (48 - 47.64)2 = 0.13 (49 - 47.64)2 = 1.85 (55 - 47.64)2 = 54.17 (58 - 47.64)2 = 107.33 (59 - 47.64)2 = (60 - 47.64)2 = (62 - 47.64)2 = (63 - 47.64)2 = (63 - 47.64)2 = 129.05 152.77 206.21 235.93 235.93 Standard Deviation Variation 3. Sum all squared differences. 2 2083.01 + 1818.17 + 0.13 + 1.85 + 54.17 + xi − μ = 107.33 + 129.05 + 152.77 + 206.21 + 235.93 + 235.93 = 5,024.55 4. Divide the summation by the number of data values. 2 xi − μ 5024.55 = = 456.78 N 11 5. Calculate the square root of the result. xi − μ N 2 = 456.78 = 21.4 A Note about Standard Deviation • Two distinct calculations – Population Standard Deviation • The measure of the spread of data within a population. • Used when you have a data value for every member of the entire population of interest. – Sample Standard Deviation • An estimate of the spread of data within a larger population. • Used when you do not have a data value for every member of the entire population of interest. • Uses a subset (sample) of the data to generalize the results to the larger population. A Note about Standard Deviation Population Standard Deviation σ= xi − μ N Sample Standard Deviation 2 σ = population standard deviation xi = individual data value ( x1, x2, x3, …) μ = population mean N = size of population s= xi − x n −1 2 s = sample standard deviation xi = individual data value ( x1, x2, x3, …) x = sample mean n = size of sample Sample Standard Deviation Variation xi − x n −1 Procedure: s= 1. Calculate the sample mean, x. 2. Subtract the mean from each value and then square each difference. 3. Sum all squared differences. 4. Divide the summation by the number of data values minus one, n - 1. 5. Calculate the square root of the result. 2 Sample Mean Central Tendency xi x= n x = sample mean xi = individual data value xi = summation of all data values n = # of data values in the sample Sample Standard Deviation Estimate the standard deviation for a σ = population for which the following data is a sample. xi − x n−1 2, 5, 48, 49, 55, 58, 59, 60, 62, 63, 63 xi 524 1. Calculate the sample mean. x = 47.64 n 11 2. Subtract the sample mean from each data value and 2 square the difference. xi − x (2 - 47.64)2 = 2083.01 (59 - 47.64)2 = 129.05 (5 - 47.64)2 = 1818.17 (60 - 47.64)2 = 152.77 (48 - 47.64)2 = 0.13 (62 - 47.64)2 = 206.21 (49 - 47.64)2 = 1.85 (63 - 47.64)2 = 235.93 (55 - 47.64)2 = 54.17 (63 - 47.64)2 = 235.93 (58 - 47.64)2 = 107.33 2 Sample Standard Deviation Variation 3. Sum all squared differences. 2 xi − x = 2083.01 + 1818.17 + 0.13 + 1.85 + 54.17 + 107.33 + 129.05 + 152.77 + 206.21 + 235.93 + 235.93 = 5,024.55 4. Divide the summation by the number of sample data values minus one. 2 xi − x 5024.55 = = 502.46 n−1 10 5. Calculate the square root of the result. xi − x n−1 2 = 502.46 = 22.4 A Note about Standard Deviation Population Standard Deviation σ= xi − μ N Sample Standard Deviation 2 σ = population standard deviation xi = individual data value ( x1, x2, x3, …) μ = population mean N = size of population s= xi − x n−1 2 s = sample standard deviation xi = individual data value ( x1, x2, x3, …) x = sample mean n = size of sample As n → N, s → σ A Note about Standard Deviation Population Standard Deviation σ= xi − μ N 2 σ = population standard deviation xi = individual data value ( x1, x2, x3, …) μ = population mean N = size of population Sample Standard Deviation Given the ACT score of 2 your every student in xi − x s = class, use the n−1 population standard deviation formula to find standard deviation of s = the sample standard deviation xi = individual data scores value ( x , x , x , …) ACT x = sample mean in the class. 1 n = size of sample 2 3 A Note about Standard Deviation Population Standard Given the ACTDeviation scores of every student in your 2 class, use thexsample − μ i σ= standard deviation N formula to estimate the standard deviation of the σ = population standard deviation ACT scores of all students xi = individual value ( x , x , x , …) at yourdata school. 1 μ = population mean N = size of population 2 3 Sample Standard Deviation s= xi − x n−1 2 s = sample standard deviation xi = individual data value ( x1, x2, x3, …) x = sample mean n = size of sample Histogram Distribution • A histogram is a common data distribution chart that is used to show the frequency with which specific values, or values within ranges, occur in a set of data. • An engineer might use a histogram to show the variation of a dimension that exists among a group of parts that are intended to be identical. Histogram Distribution • Large sets of data are often divided into limited number of groups. These groups are called class intervals. -6 to -16 -5 to 5 Class Intervals 6 to 16 Histogram Distribution Frequency • The number of data elements in each class interval is shown by the frequency, which occurs along the Y-axis of the graph 7 5 3 1 -16 to -6 -5 to 5 6 to 16 Histogram Distribution Example 1, 7, 15, 4, 8, 8, 5, 12, 10 Frequency 1, 4, 5, 7, 8, 8, 10, 12,15 4 3 2 1 1 to 5 6 to 10 11 to 15 Histogram Distribution • The height of each bar in the chart indicates the number of data elements, or frequency of occurrence, within each range Frequency 1, 4, 5, 7, 8, 8, 10,12,15 4 3 2 1 1 to 5 6 to 10 11 to 15 Histogram Distribution Cube Side Length 5 Frequency 4 3 2 1 0 Length (in.) MINIMUM = 0.745 in. Class Intervals MAXIMUM = 0.760 in. Dot Plot -6 -5 Distribution 0 3 -1 -3 3 2 1 0 -1 0 -1 1 2 -1 1 -2 1 2 1 0 -2 -4 0 0 -4 -3 -2 -1 0 1 2 3 4 5 6 Frequency Dot Plot Distribution 0 3 -1 -3 3 2 1 0 -1 0 -1 1 2 -1 1 -2 1 2 1 0 -2 -4 0 0 5 3 1 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 Normal Distribution Distribution “Is the data distribution normal?” • Translation: Is the histogram/dot plot bellshaped? – Does the greatest frequency of the data values occur at about the mean value? – Does the curve decrease on both sides away from the mean? – Is the curve symmetric about the mean? Normal Distribution Distribution Frequency Bell shaped curve -6 -5 -4 -3 -2 -1 0 1 2 Data Elements 3 4 5 6 Normal Distribution Distribution Does the greatest frequency of the data values occur at about the mean value? Frequency Mean Value -6 -5 -4 -3 -2 -1 0 1 2 Data Elements 3 4 5 6 Normal Distribution Distribution Does the curve decrease on both sides away from the mean? Frequency Mean Value -6 -5 -4 -3 -2 -1 0 1 2 Data Elements 3 4 5 6 Normal Distribution Distribution Is the curve symmetric about the mean? Frequency Mean Value -6 -5 -4 -3 -2 -1 0 1 2 Data Elements 3 4 5 6 What if things are not equal? Histogram Interpretation: Skewed (Non-Normal) Right Normal Distribution Distribution If the data are normally distributed: • 68% of the observations fall within 1 standard deviation of the mean. • 95% of the observations fall within 2 standard deviations of the mean. • 99.7% of the observations fall within 3 standard deviations of the mean. Normal Distribution Example Data from a sample of a larger population Mean = x = 0.083 Standard Deviation = s = 1.77 (sample) Distribution 0.08 + 1.77 = 1.88 0.08 + - 1.77 = -1.69 Normal Distribution 68 % s s -1.77 +1.77 x 0.08 Data Elements Distribution 0.08 + 3.54 = 3.62 0.08 + -3.54 = - 3.46 Normal Distribution 95 % 2σ - 3.54 2σ + 3.54 x 0.08 Data Elements