Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Chapter 3 Section 3 Measures of variation Measures of Variation • Example 3 – 18 Suppose we wish to test two experimental brands of outdoor paint to see how long will last before fading. Let’s say we have six gallons of each paint to test. We have six cans of each type of paint. Lets find the mean for each brand. Brand A Brand B (time in months) (time in months) 10 35 60 45 50 30 30 35 40 40 20 25 Measures of Variation Brand A (10+60+50+30+40+20)/6 =210/6 = 35 months Brand B (35+45+30+35+40+25)/6 =210/6 = 35 months Brand A Brand B (time in months) (time in months) 10 35 60 45 50 30 30 35 40 40 20 25 Measures of Variation • So even though the means are the same for both brands, the spread, or variation, is quite different. By comparing the ranges of each you can see that Brand B is more consistent. 70 60 50 40 Brand A 30 Brand B 20 10 0 0 5 10 Measures of Variation • So even though the means are the same for both brands, the spread, or variation, is quite different. By comparing the ranges of each you can see that Brand B is more consistent. • Range Brand A 60-10=50 Range Brand B 45-25=20 Measures of Variation 1. 2. 3. 4. 5. 6. Find the mean. Subtract the mean from each data value. Square each result. Find the sum of the squares. Divide the sum by N to get the variance. Take the square root of the variance to get the standard deviation. Measures of Variation Variance 2 (𝑋 − 𝜇) 𝜎2 = 𝑁 Standard Deviation 𝜎= 𝜎= (𝑋 − 𝜇)2 𝑁 Chapter 3 Section 3 Measures of variance Variance •The variance is a measure of variability that uses all the data •The variance is based on the difference between each observation (xi) and the mean ( xfor the sample and μ for the population). The variance is the average of the squared differences between the observations and the mean value For the population: For the sample: Standard Deviation • The Standard Deviation of a data set is the square root of the variance. • The standard deviation is measured in the same units as the data, making it easy to interpret. Computing a standard deviation For the population: For the sample: Shortcut or computational Formulas 2 for s and s n(å X ) - (å X ) 2 s = 2 2 n(n -1) n(å X ) - (å X ) 2 s= n(n -1) 2 Variance and Standard Deviation for Grouped Data 1. 2. 3. 4. 5. 1. Make a table as shown and find the midpoint of each class. Multiply the frequency by the midpoint. Multiply the frequency by the square of the midpoint. Find the sums of B, D, and E. Substitute in the formula. (See next slide) Take the Square root to get the standard deviation. A B C D E CLASS FREQ. MIDPT F Xm F(Xm) ^2 Formula 2 𝑠 = 𝑛 𝑓 2 ∙ 𝑋𝑚 − 𝑓 ∙ 𝑋𝑚 𝑛(𝑛 − 1) 2 Coefficient of Variation Just divide the standard deviation by the mean and multiply times 100 Computing the coefficient of variation: For the population For the sample Chapter 3 Section 3 Measures of variance Measures of Variance • The Coefficient of Variance, denoted CVar, is the standard deviation divided by the mean. The result is expressed as a percentage. • The coefficient of variance is used when you want to compare standard deviations of two different types of variables. Coefficient of Variation Just divide the standard deviation by the mean and multiply times 100 Computing the coefficient of variation: For the population For the sample Measures of Variance • Range Rule of Thumb: – A rough estimate of the standard deviation is 𝑟𝑎𝑛𝑔𝑒 𝑠≈ 4 • The range rule of thumb is only an approximation and should be used when the distribution of the data values is unimodal and roughly symmetric. Chebyshev’s Theorem • Chebyshev was a Russian mathematician. • Chebyshev’s theorem: The proportion of values from a data set that will fall within k standard deviations of the mean 1 will be at least 1- 2, where k is a number greater 𝑘 than 1 ( k is not necessarily an integer). Chebyshev’s Theorem • The theorem states that three-fourths, or 75% of the data values will fall within 2 standard deviations of the mean of the data set. This is a result found by substituting k=2 in the expression. • Furthermore, the theorem states that at least eight-ninths, or 88.89%, of the data will fall within 3 standard deviation of the mean. Chebyshev’s Theorem • The theorem can be applied to any distribution regardless of its shape. • How to use Chebyshev’s theorem to find out information. Chebyshev’s Theorem Example 1. The mean price of houses in a certain neighborhood is $50,000, and the standard deviation is $10,000. Find the price range for which at least 75% of the houses will sell. – We do this by adding and subtracting 2 times the standard deviation. Chebyshev’s Theorem We are given that 𝜇 = $50,000 and that 𝜎 = $10,000. So, $50,000 + 2($10,000) = $50,000 + $20,000 = $70,000 And $50,000 - 2($10,000) = $50,000 - $20,000 = $30,000 Chebyshev’s Theorem • A survey of local companies found that the mean amount of travel allowance for executives was $0.25 per mile. The standard deviation was $0.02. Using Chebyshev’s theorem, find the minimum percentage of the data that will fall between $0.20 and $0.30. Chebyshev’s Theorem • A survey of local companies found that the mean amount of travel allowance for executives was $0.25 per mile. The standard deviation was $0.02. Using Chebyshev’s theorem, find the minimum percentage of the data that will fall between $0.20 and $0.30. Step 1 – Subtract the mean from the larger value. $0.30 - $0.25=$0.05 Step 2 – Divide the difference by the standard deviation to get k. 0.05 k= = 2.5 0.02 Step 3 - Use Chebyshev’s theorem to find the percentage. 1 1 1 1− 2 =1− =1− 𝑘 2.5 6.25 = 1 − 0.16 = 0.84 or 84% The Empirical (Normal) Rule • Chebyshev’s theorem applies to any distribution regardless of shape. However, when a distribution is Bell-Shaped ( or what is called normal), the following statements, which make up the empirical rule, are true. 1. Approx. 68% of the data values fall within 1 standard deviation of the mean. 2. Approx. 95% of the data values fall within 2 standard deviation of the mean. 3. Approx. 99.7% of the data values fall within 3 standard deviation of the mean. Chebyshev’s Theorem Chapter 3 Section 4 Measures of Position Standard Scores • “You can’t compare apples and oranges.” But with Statistics it can be done to some extent. • Example Music test and an English exam. – Number of question – Values of each question – And so on Z Score or Standard Score • The z-score uses the mean and the standard deviation • Definition – A z score or standard score for a value is obtained by subtracting the mean from the value and dividing the result by the standard deviation. The symbol for the standard score is z. • The z-score represent the number of standard deviations away from the mean a value is. Z Score or Standard Score 𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑛 𝑧= 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 For a sample 𝑋 −𝑋 𝑧= 𝑠 For a population 𝑋−𝜇 𝑧= 𝜎 Z Score or Standard Score Examples Chapter 3 Section 4 Measures of position Measures of position Percentiles • Percentiles divide the set into 100 equal parts. • Percentiles are used to compare individuals’ test scores with national test scores. • Percentiles are not to be confused with the percent grade you receive on a test. Percentiles • Percentiles are represented by, 𝑃1 , 𝑃2 , 𝑃3 , … , 𝑃99 And divide the distribution into 100 groups. P1, P2 , P3,...., Pn P1, P2 , P3,...., Pn Percentiles Example Systolic Blood Pressure The frequency from the systolic blood pressure readings (in millimeters of mercury, mm Hg) of 200 randomly selected college students is shown here. Construct a percentile graph. A Class boundaries B Frequency 89.5-104.5 24 104.5-119.5 63 119.5-134.5 73 134.5-149.5 26 149.5-164.6 12 164.5-179.5 4 200 C Cumulative Frequency D Cumulative Percent Percentiles Example Steps: Step 1 Find the cumulative frequencies and place them in column C A Class boundaries B Frequency C Cumulative Frequency 89.5-104.5 24 24 104.5-119.5 63 86 119.5-134.5 73 158 134.5-149.5 26 184 149.5-164.6 12 196 164.5-179.5 4 200 200 D Cumulative Percent Percentiles Example Steps: Step 2 Find the cumulative percentages and place them in column D. To do this step use the formula Cumulative % = 𝐶𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑛 A Class boundaries B Frequency C Cumulative Frequency 89.5-104.5 24 24 104.5-119.5 63 86 119.5-134.5 73 158 134.5-149.5 26 184 149.5-164.6 12 196 164.5-179.5 4 200 ⋅ 100% 200 D Cumulative Percent Percentiles Example Steps: Step 3 Graph the data, using class boundaries for the x axis and the percentages for the y axis. A Class boundaries B Frequency C Cumulative Frequency D Cumulative Percent 89.5-104.5 24 24 12 104.5-119.5 63 86 43 119.5-134.5 73 158 79 134.5-149.5 26 184 92 149.5-164.6 12 196 98 164.5-179.5 4 200 100 200 Percentiles Example Percentile Formula The percentile corresponding to a given value X is computed using the following formula: 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑋 + 0.5 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = ⋅ 100% 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 Percentile Example Test Scores A teacher gives a 20-point test to 10 students. The scores are shown here. Find the percentile rank of a score or 12. 18, 15, 12, 6, 8, 2, 3, 5, 20, 10 Percentile Example • Step 1 – Arrange the data 2, 3, 5, 6, 8, 10, 12, 15, 18, 20 • Step 2 – Substitute into the formula 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑋 + 0.5 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = ⋅ 100% 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 Since there are 6 values below 12 the solution is: 6 + 0.5 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = ⋅ 100% = 65𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 10 Percentile Example Find the value corresponding to a given percentile. How do we do this? Percentile Example • Using The values from the Previous example: 18, 15, 12, 6, 8, 2, 3, 5, 20, 10 Step 1: Arrange the data: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20 Step 2: Compute: 𝑛⋅𝑝 𝑐= 100 Where n = total # of values and p = percentile • Time for work this will be due next week but you will have the option to turn it in today. • Also if you have the work sheet from yester day you may turn that in. • Turn in all work into the box. • Complete Problems 10-22 from Chapter 3 section 3 from the book. Pg.153-154 Chapter 3 Section 4 Measures of position Quartiles and Deciles • Quartiles similar to percentiles divide a data set into four groups, separated by 𝑄1 , 𝑄2 , 𝑄3 . Note that 𝑄1 is the same as the 25th percentile How do you find data values that correspond to 𝑄1 , 𝑄2 , 𝑄3 . Quartiles and Deciles 1. Arrange the data in order from lowest to highest. 2. Find the median of the data values. This is the value for 𝑄2 . 3. Find the Median of the data values that fall below 𝑄2 . This is the value for 𝑄1 . 4. Find the Median of the data values that fall above 𝑄2 . This is the value for 𝑄3 . Quartiles and Deciles • Example Find 𝑄1 , 𝑄2 , 𝑎𝑛𝑑 𝑄3 for the data set: 15, 13, 6, 5, 12, 50, 22, 18 1. Arrange the data in order 5, 6, 12, 13, 15, 18, 22, 50 2. Find median. Between 13 and 15. So 13+15 2 = 14. Quartiles and Deciles Find 𝑄1 , 𝑄2 , 𝑎𝑛𝑑 𝑄3 for the data set: 15, 13, 6, 5, 12, 50, 22, 18 3. Find median below and above 𝑄2 . 5, 6, 12, 13 9 15, 18, 22, 50 20 Thus 𝑄1 = 9, 𝑄2 = 14, 𝑎𝑛𝑑 𝑄3 = 20 Quartiles and Deciles • Interquartile Range: – This is defined by the difference between 𝑄1 𝑎𝑛𝑑 𝑄3 and is the range of the middle 50% of the data. Quartiles and Deciles • Deciles – Just like percentiles and quartiles, deciles divide a data set into 10 groups, denoted 𝐷1 , 𝐷2 , … , 𝐷9 On page 151 there is a summary table Quartiles and Deciles • Outliers – An outlier is an extremely high or low value when compared with the rest of the data values. Chapter 3 Section 4 Exploratory Data Analysis Exploratory Data Analysis • Exploratory data analysis is used to examine data to find out what information can be discovered about the data such as the center and the spread. Exploratory Data Analysis The five number summary and boxplots 1. The lowest value of the data set (i.e. minimum) 2. 𝑄1 3. The median 4. 𝑄3 5. The highest value of the data set (i.e. maximum) These values are called the five number summary. Exploratory Data Analysis • Boxplot – A boxplot is a graph of a data set obtained by drawing a horizontal line from the minimum data value to 𝑄1 , drawing a horizontal line from 𝑄3 to the maximum data value, and drawing a box whose vertical sides pass through 𝑄1 and 𝑄3 with a vertical line inside the box passing through the median or 𝑄2 . Exploratory Data Analysis • How to construct a box plot 1. Find the five number summary for the data values. 2. Draw a horizontal axis with a scale such that it includes the maximum and minimum data values. 3. Draw a box whose vertical sides go through 𝑄1 and 𝑄3 , and draw a vertical line through the median. 4. Draw a line from the minimum data value to the left side of the box and a line from the maximum to the right side of the box. Exploratory Data Analysis Information obtained from a boxplot 1. a) b) c) If the median is near the center of the box, the distribution is approximately symmetric. If the median falls to the left of the center of the box, the distribution is positively skewed. If the median falls to the right of the center, the distribution is negatively skewed. 2. a) b) c) If the lines are about the same length, the distribution is approximately symmetric. If the right line is larger then the left line, the distribution is positively skewed. If the left line is larger than the right line, the distribution is negatively skewed. Exploratory Data Analysis • Resistant Statistic – these statistics are less affected by outliers. Median and the interquartile range.