Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Displaying and Summarizing Quantitative Data Dot Plots Definition A dot plot is the representation of a set of data over a number line. The number of dots over a number represents the relative quantity of the value. Dot Plots Definition A dot plot is the representation of a set of data over a number line. The number of dots over a number represents the relative quantity of the value. Example The following ate the test scores for a particular high school student in their math class over the course of an academic year. 64 71 83 92 73 56 87 95 85 83 92 92 74 76 84 91 83 85 95 Solution Frequency Grades of a High School Student 3 • • 2 •• •• • 1 50 • 60 •••• 70 Grades •••• ••• 80 90 100 Pros and Cons What’s Good Gives a good idea of distribution Pros and Cons What’s Good Gives a good idea of distribution Preserves all of the data points Pros and Cons What’s Good Gives a good idea of distribution Preserves all of the data points What’s Not Good Tedious to plot Pros and Cons What’s Good Gives a good idea of distribution Preserves all of the data points What’s Not Good Tedious to plot Can be hard to read Pros and Cons What’s Good Gives a good idea of distribution Preserves all of the data points What’s Not Good Tedious to plot Can be hard to read Not practical for large data sets Distributions Definition A distribution is a representation of data vs. frequency. It shows all possible values and how often they occur. Distributions Definition A distribution is a representation of data vs. frequency. It shows all possible values and how often they occur. Now we want to concern ourselves with the analysis of the graphs. We can analyze these in a much more constructive way that we could with the graphs of categorical variables. Here we are analyzing the distribution represented by the graph. Distribution Analysis 1 Center: Which class contains the central element(s) Distribution Analysis 1 2 Center: Which class contains the central element(s) Shape: Number of peaks, skewness Distribution Analysis 1 2 3 Center: Which class contains the central element(s) Shape: Number of peaks, skewness Spread: Range=max-min Distribution Analysis 1 2 3 Center: Which class contains the central element(s) Shape: Number of peaks, skewness Spread: Range=max-min In our example, we can see a couple of things: Range: Highest value - lowest value Here, the range would be 95 − 56 = 39. Distribution Analysis 1 2 3 Center: Which class contains the central element(s) Shape: Number of peaks, skewness Spread: Range=max-min In our example, we can see a couple of things: Range: Highest value - lowest value Here, the range would be 95 − 56 = 39. Center: The central value(s) is the center. It could be a value or a class, depending on the type of graph. Here, the center is the 10th value, since there are 19 data points in the set. The value we seek is 84. Distribution Analysis 1 2 3 Center: Which class contains the central element(s) Shape: Number of peaks, skewness Spread: Range=max-min In our example, we can see a couple of things: Range: Highest value - lowest value Here, the range would be 95 − 56 = 39. Center: The central value(s) is the center. It could be a value or a class, depending on the type of graph. Here, the center is the 10th value, since there are 19 data points in the set. The value we seek is 84. Shape: How many peaks are there? Is it roughly in the middle or to one side? Here we have one peak, so we would say the distribution is unimodal. That peak is to the right, so the tail stretches out to the left. We would say this graph is left skewed. Stem-and-Leaf Plots Similarities to Dot Plots Gives idea of distribution Stem-and-Leaf Plots Similarities to Dot Plots Gives idea of distribution Preserves data Stem-and-Leaf Plots Similarities to Dot Plots Gives idea of distribution Preserves data Not practical for large data sets Differences from Dot Plots Used for quantitative variables Stem-and-Leaf Plots Similarities to Dot Plots Gives idea of distribution Preserves data Not practical for large data sets Differences from Dot Plots Used for quantitative variables Easier to read actual data elements Stem-and-Leaf Plots Similarities to Dot Plots Gives idea of distribution Preserves data Not practical for large data sets Differences from Dot Plots Used for quantitative variables Easier to read actual data elements Can be used for comparisons of two data sets Stem-and-Leaf Plot Example Example Using the same data set as we did for the dot plot, construct a stem-and-leaf plot. First thing we need to do is order the data elements. 56 76 85 92 64 83 85 92 71 83 87 95 73 83 91 95 74 84 92 Stem-and-Leaf Plot Example Grades for a High School Student 9 8 7 6 5 These would be the stems for our plot. Stem-and-Leaf Plot Example Grades for a High School Student 9 8 7 6 5 These would be the stems for our plot. Note: Repetition is extremely important. Stem-and-Leaf Plot Example Grades for a High School Student 9 1 2 2 2 5 5 8 3 3 3 4 5 5 7 7 1 3 4 6 6 4 5 6 Here we get the exact same answer for the range and the center, although we only give the class in which the center lies, so we would say that the center is in the 80’s. We get that the shape is again unimodal and skewed left. It may look different, but since it represents the same distribution, we expect similar answers. Stem-and-Leaf Plot Example Grades for a High School Student 9 1 2 2 2 5 5 8 3 3 3 4 5 5 7 7 1 3 4 6 6 4 5 6 Here we get the exact same answer for the range and the center, although we only give the class in which the center lies, so we would say that the center is in the 80’s. We get that the shape is again unimodal and skewed left. It may look different, but since it represents the same distribution, we expect similar answers. Notice that the values on the right are essentially in columns - this is what allows us to quickly see which classes have more elements. More Stem-and-Leaf Plots What if we had a 3 digit number? Suppose the student got a 100 on the next exam? Grades for a High School Student 10 0 9 1 2 2 2 5 5 8 3 3 3 4 5 5 7 7 1 3 4 6 6 4 5 6 Stem-and-Leaf Plots for Comparisons Example Suppose we wanted to compare the careers of Babe Ruth and Mark McGwire in terms of their yearly home run totals to determine which player was the more consistent long ball hitter. Make a back-to-back stem-and-leaf plot to make the is determination. Ruth: 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22 McGwire: 49, 32, 33, 39, 22, 42, 9, 9, 39, 52, 58, 70, 65, 32, 29 Stem-and-Leaf Plots for Comparisons Ruth v. McGwire 7 6 5 4 3 2 1 0 We set up the graph with one set of data increasing out to the right and the other increasing out to the left. This way we have a side-by-side comparison of the data sets. Stem-and-Leaf Plots for Comparisons 9 7 6 6 9 6 Ruth v. McGwire 7 0 0 6 5 4 4 5 2 8 1 1 4 2 9 5 4 3 2 2 3 5 2 2 2 9 1 0 9 9 Who is more consistent and why? 9 9 Histograms Used for quantitative variables Histograms Used for quantitative variables Tracks frequency and shows distribution Histograms Used for quantitative variables Tracks frequency and shows distribution Does not preserve individual values Histograms Used for quantitative variables Tracks frequency and shows distribution Does not preserve individual values Good for a large number of values Histograms Used for quantitative variables Tracks frequency and shows distribution Does not preserve individual values Good for a large number of values Bars must be vertical and must touch Histograms Example For our test scores example, construct a histogram and analyze the distribution. It is easier if the values are in order as we will be grouping them into classes. 56 76 85 92 64 83 85 92 71 83 87 95 73 83 91 95 74 84 92 Histograms We first want to create a frequency table. This is a collection of non-overlapping classes and the frequency of observation in each of those classes. We need to determine the following in this order: Histograms We first want to create a frequency table. This is a collection of non-overlapping classes and the frequency of observation in each of those classes. We need to determine the following in this order: Number of classes The rule of thumb with the number of classes is to use the square root of the number of observations in the data set. √ 19 ≈ 4.36 So, we can use 4 or 5 classes. I tend to go up to the next integer to be sure I have enough classes. So we will use 5 for our graph. Histograms Size of each class We want them to be the same width so that the taller classes will be known to have the most elements. If not then we have to find the area of each rectangle to determine relative size. To find the size, we divide the ‘range’ by the number of classes. size = 38 95 − 56 + 1 = = 7.6 5 5 We could use 7.6 for the class width or we can go to the next largest integer. Where we may have extra if we round up, it is better than not having enough of a range in the classes to cover all of the data. For the sake of simplicity, we will use 8. Histograms Endpoints of each class We start the smallest class with a left endpoint of 56, since that was our minimum. Then, to find the next left endpoint, add 8 to 56. Continue in this manner until we have 5 classes. Grade Range 5664728088- Frequency Histograms Then, we subtract 1 from each left endpoint to find the right endpoint of the previous class. Grade Range 56-63 64-71 72-79 80-87 88-95 Frequency Histograms Then, we subtract 1 from each left endpoint to find the right endpoint of the previous class. Grade Range 56-63 64-71 72-79 80-87 88-95 Frequency Finally, we count how many elements go in each class. Grade Range 56-63 64-71 72-79 80-87 88-95 Frequency 1 2 3 7 6 Histograms Grades of a High School Student Frequency 8 6 4 2 56 64 72 80 Grades 88 96 We see the same range and shape. Here, we’d have no choice but to give the class only for the center as we would lose the ability to see individual values. Using The Calculator We can make some graphs on the TI-series graphing calculator. One of the options we have is to make a histogram. Using The Calculator We can make some graphs on the TI-series graphing calculator. One of the options we have is to make a histogram. The advantages to using technology are that we don’t have to make frequency tables or figure out how many classes we need, etc. Using The Calculator We can make some graphs on the TI-series graphing calculator. One of the options we have is to make a histogram. The advantages to using technology are that we don’t have to make frequency tables or figure out how many classes we need, etc. We do have to keep in mind, however, that the number of classes may be different than when we make the graph by hand. We are using more approximations when we work by hand than when we use technology. But, this is an acceptable difference as long as the method we use is valid. How To Make Histograms On The TI 1 In the STAT menu, select EDIT How To Make Histograms On The TI 1 In the STAT menu, select EDIT 2 Input all of the data in the same column How To Make Histograms On The TI 1 In the STAT menu, select EDIT 2 Input all of the data in the same column 3 Press 2nd and then MODE to quit to a blank screen How To Make Histograms On The TI 1 In the STAT menu, select EDIT 2 Input all of the data in the same column 3 Press 2nd and then MODE to quit to a blank screen 4 Press 2nd and Y= to get into the STATPLOT menu How To Make Histograms On The TI 1 In the STAT menu, select EDIT 2 Input all of the data in the same column 3 Press 2nd and then MODE to quit to a blank screen 4 Press 2nd and Y= to get into the STATPLOT menu 5 Make sure all of the plots are off (if need be, use option 4) How To Make Histograms On The TI 1 In the STAT menu, select EDIT 2 Input all of the data in the same column 3 Press 2nd and then MODE to quit to a blank screen 4 Press 2nd and Y= to get into the STATPLOT menu 5 Make sure all of the plots are off (if need be, use option 4) 6 Press ENTER on one of the plots How To Make Histograms On The TI 1 In the STAT menu, select EDIT 2 Input all of the data in the same column 3 Press 2nd and then MODE to quit to a blank screen 4 Press 2nd and Y= to get into the STATPLOT menu 5 Make sure all of the plots are off (if need be, use option 4) 6 Press ENTER on one of the plots 7 Turn the plot ON with the ENTER key How To Make Histograms On The TI 1 In the STAT menu, select EDIT 2 Input all of the data in the same column 3 Press 2nd and then MODE to quit to a blank screen 4 Press 2nd and Y= to get into the STATPLOT menu 5 Make sure all of the plots are off (if need be, use option 4) 6 Press ENTER on one of the plots 7 Turn the plot ON with the ENTER key 8 Select the histogram, which is the third graph in the top row How To Make Histograms On The TI 1 In the STAT menu, select EDIT 2 Input all of the data in the same column 3 Press 2nd and then MODE to quit to a blank screen 4 Press 2nd and Y= to get into the STATPLOT menu 5 Make sure all of the plots are off (if need be, use option 4) 6 Press ENTER on one of the plots 7 Turn the plot ON with the ENTER key 8 Select the histogram, which is the third graph in the top row 9 Make sure the XList is correct and then press ZOOM and then 9 , which is the option for ZOOMSTAT How To Make Histograms On The TI 1 In the STAT menu, select EDIT 2 Input all of the data in the same column 3 Press 2nd and then MODE to quit to a blank screen 4 Press 2nd and Y= to get into the STATPLOT menu 5 Make sure all of the plots are off (if need be, use option 4) 6 Press ENTER on one of the plots 7 Turn the plot ON with the ENTER key 8 Select the histogram, which is the third graph in the top row 9 Make sure the XList is correct and then press ZOOM and then 9 , which is the option for ZOOMSTAT If you want to get data from the graph, like endpoints for classes or for the frequency for a class, you can press TRACE and then use the arrows to scroll around. Histograms Example The EPA lists most sports cars in its “two-seater” category. The table below gives the city mileage in miles per gallon. Make and analyze a histogram for the the city mileage. Model Acura NSX Audi Quattro Audi Roadster BMW M Coupe BMW Z3 Coupe BMW Z3 Roadster BMW Z8 Corvette Prowler Ferrari 360 Thunderbird Mileage 17 20 22 17 19 20 13 18 18 11 17 Model Insight S2000 Lamborghini Mazda SL500 SL600 SLK230 SLK 320 911 Boxster MR2 Mileage 57 20 9 22 16 13 23 20 15 19 25 Histograms √ There are 22 cars, so we would use 4 < 22 < 5 classes, so here I will choose 5. The size of each class would be 49 57 − 9 + 1 = = 9.8 5 5 So we will use 10. Histograms √ There are 22 cars, so we would use 4 < 22 < 5 classes, so here I will choose 5. The size of each class would be 49 57 − 9 + 1 = = 9.8 5 5 So we will use 10. Mileage 9 -18 19-28 29-38 39-48 49-58 Frequency 11 10 0 0 1 Histograms MPG for Sports Cars Frequency 12 9 6 3 9 19 29 39 MPG 49 59 Histograms MPG for Sports Cars Frequency 12 9 6 3 9 Center: 19 29 39 MPG 49 59 Histograms MPG for Sports Cars Frequency 12 9 6 3 9 19 29 39 MPG 49 Center: Boundary between the first two classes Range: 59 Histograms MPG for Sports Cars Frequency 12 9 6 3 9 19 29 39 MPG 49 Center: Boundary between the first two classes Range: 58 − 9 = 49 Shape: 59 Histograms MPG for Sports Cars Frequency 12 9 6 3 9 19 29 39 MPG 49 Center: Boundary between the first two classes Range: 58 − 9 = 49 Shape: Unimodal, skewed right 59 Central Tendency We will use three methods of measuring central tendency: 1 mean 2 median 3 mode Example Example Find the mean, median and mode for the following data set. 11 10 9 8 7 6 5 4 3 2 1 0 0 4 0 0 0 5 4 0 0 5 0 0 0 1 2 5 Solution Mean x This the arithmetic center. n x= 1X xk n k=1 This is just a fancy way of saying to add the 16 values together and divide by 16. When we do we get Solution Mean x This the arithmetic center. n x= 1X xk n k=1 This is just a fancy way of saying to add the 16 values together and divide by 16. When we do we get x = 43.5 Solution Median M This is the geometric center. To find, we line up all of the values in order and find the middle one. If there is an odd number of observations, then the median is the one in the middle. If there is an even number of observations, the median is the mean of the two ‘middle’ values. Here, we have Solution Median M This is the geometric center. To find, we line up all of the values in order and find the middle one. If there is an odd number of observations, then the median is the one in the middle. If there is an even number of observations, the median is the mean of the two ‘middle’ values. Here, we have M= 32 + 35 = 33.5 2 Mode This is the value(s) that occur most often, unless all values occur the same number of times, in which case there is no mode. Here, Solution Median M This is the geometric center. To find, we line up all of the values in order and find the middle one. If there is an odd number of observations, then the median is the one in the middle. If there is an even number of observations, the median is the mean of the two ‘middle’ values. Here, we have M= 32 + 35 = 33.5 2 Mode This is the value(s) that occur most often, unless all values occur the same number of times, in which case there is no mode. Here, mode = 30 The Relationship Between Mean and Median The Relationship Between Mean and Median This picture indicates a serious drawback to using means: outliers. The median is what we call resistant; an extreme value does not affect the median. The mean, however, is not resistant. When We Use Mean v. Median 1 If distribution is symmetric, then mean = median, and we use the mean When We Use Mean v. Median 1 If distribution is symmetric, then mean = median, and we use the mean 2 If there are outliers or strong skewness, we use the median Using the Mean Example Suppose you got an 84, 72 and 78 on your first 3 exams and wanted to know what grade you needed to get on the fourth exam to have at least an 80 average? Using the Mean Example Suppose you got an 84, 72 and 78 on your first 3 exams and wanted to know what grade you needed to get on the fourth exam to have at least an 80 average? We want an average of 80 for the 4 grades. So, we need to solve for x in 84 + 72 + 78 + x 234 + x = = 80 4 4 So, we get 234 + x = 80 ⇒ 234 + x = 320 ⇒ x = 86 4 Another Mean Example Example Suppose you had a 75 average through 4 tests and got an 85 on the 5th test. What is your average now? Another Mean Example Example Suppose you had a 75 average through 4 tests and got an 85 on the 5th test. What is your average now? If we have a 75 average through 4 exams, then we have accumulated 75 × 4 = 300 points. So, if we wanted to know the average with this 5th grade, we’d have x= 300 + 85 385 = = 77 5 5 Yet Another Mean Example Example Suppose you had a group of 11 people and the average age was 27. If one of those people left, the average age of the remaining 10 was 29. What is the age of the person who left? Yet Another Mean Example Example Suppose you had a group of 11 people and the average age was 27. If one of those people left, the average age of the remaining 10 was 29. What is the age of the person who left? Total age of the 11 people: 11 × 27 = 297. Total age of the 10 people : 10 × 29 = 290 Difference is 297 − 290 = 7 Means From Frequency Tables Example Find the mean of the following values. Age 21 22 23 24 25 Frequency 5 8 4 1 2 Means From Frequency Tables Example Find the mean of the following values. Age 21 22 23 24 25 Frequency 5 8 4 1 2 We first count the total number of observations, which is 20. Then ... x= 21 ∗ 5 + 22 ∗ 8 + 23 ∗ 4 + 24 ∗ 1 + 25 ∗ 2 447 = = 22.35 20 20 Box Plots and the 5-Number Summary When dealing with the median, we measure variation with the 5-number summary. These 5 numbers indicate the maximum and minimum, the median and the quartiles. Box Plots and the 5-Number Summary When dealing with the median, we measure variation with the 5-number summary. These 5 numbers indicate the maximum and minimum, the median and the quartiles. In order to find the five number summary, we first line the data elements in order. Then we find the minimum and maximum, and then the median. minimum smallest value of the set Box Plots and the 5-Number Summary When dealing with the median, we measure variation with the 5-number summary. These 5 numbers indicate the maximum and minimum, the median and the quartiles. In order to find the five number summary, we first line the data elements in order. Then we find the minimum and maximum, and then the median. minimum maximum smallest value of the set largest value of the set Box Plots and the 5-Number Summary When dealing with the median, we measure variation with the 5-number summary. These 5 numbers indicate the maximum and minimum, the median and the quartiles. In order to find the five number summary, we first line the data elements in order. Then we find the minimum and maximum, and then the median. minimum maximum median smallest value of the set largest value of the set central(s) value of the set Box Plots and the 5-Number Summary When dealing with the median, we measure variation with the 5-number summary. These 5 numbers indicate the maximum and minimum, the median and the quartiles. In order to find the five number summary, we first line the data elements in order. Then we find the minimum and maximum, and then the median. minimum maximum median first quartile Q1 smallest value of the set largest value of the set central(s) value of the set median of all values smaller than the median Box Plots and the 5-Number Summary When dealing with the median, we measure variation with the 5-number summary. These 5 numbers indicate the maximum and minimum, the median and the quartiles. In order to find the five number summary, we first line the data elements in order. Then we find the minimum and maximum, and then the median. minimum maximum median first quartile Q1 third quartile Q3 smallest value of the set largest value of the set central(s) value of the set median of all values smaller than the median median of all values larger than the median 5-Number Summary Example Example Find the 5-number summary for the data from the first example. 11 10 9 8 7 6 5 4 3 2 1 0 0 4 0 0 0 5 4 0 0 5 0 0 0 1 2 5 5-Number Summary Example Since the values are already in order, we only need to calculate the values. minimum Q1 Median Q3 maximum 4 30 33.5 52.5 110 Teddy Ballgame Example Ted Williams yearly RBI totals: 145, 113, 120, 137, 123, 114, 127, 159, 97, 126, 3, 34, 89, 83, 82, 87, 85, 43, 72 Find the 5-number summary for this set of data, Teddy Ballgame Example Ted Williams yearly RBI totals: 145, 113, 120, 137, 123, 114, 127, 159, 97, 126, 3, 34, 89, 83, 82, 87, 85, 43, 72 Find the 5-number summary for this set of data, What do we do first? Teddy Ballgame Example Ted Williams yearly RBI totals: 145, 113, 120, 137, 123, 114, 127, 159, 97, 126, 3, 34, 89, 83, 82, 87, 85, 43, 72 Find the 5-number summary for this set of data, What do we do first? We put the values in order: 3, 34, 43, 72, 82, 83, 85, 87, 89, 97, 113, 114, 120, 123, 126, 127, 137, 145, 159. Teddy Ballgame Example Ted Williams yearly RBI totals: 145, 113, 120, 137, 123, 114, 127, 159, 97, 126, 3, 34, 89, 83, 82, 87, 85, 43, 72 Find the 5-number summary for this set of data, What do we do first? We put the values in order: 3, 34, 43, 72, 82, 83, 85, 87, 89, 97, 113, 114, 120, 123, 126, 127, 137, 145, 159. Then ... Solution Minimum Q1 Median Q3 Maximum 3 82 97 126 159 Box-and-Whisker Plots How can we visually represent this summary of the data? We use box plots, or box-and-whisker plots. Box-and-Whisker Plots How can we visually represent this summary of the data? We use box plots, or box-and-whisker plots. Ted Williams’ RBI Totals RBIs 160 140 120 100 80 60 40 20 Teddy Ballgame Box-and-Whisker Plots How can we visually represent this summary of the data? We use box plots, or box-and-whisker plots. Ted Williams’ RBI Totals RBIs 160 140 120 100 80 60 40 20 Teddy Ballgame Box-and-Whisker Plots How can we visually represent this summary of the data? We use box plots, or box-and-whisker plots. Ted Williams’ RBI Totals RBIs 160 140 120 100 80 60 40 20 Teddy Ballgame Box-and-Whisker Plots How can we visually represent this summary of the data? We use box plots, or box-and-whisker plots. Ted Williams’ RBI Totals RBIs 160 140 120 100 80 60 40 20 Teddy Ballgame Using Technology The box plot is another that we can construct using the TI-series graphing calculator. We do everything the same as when constructing a histogram until we reach the point where we choose the type of graph. Using Technology The box plot is another that we can construct using the TI-series graphing calculator. We do everything the same as when constructing a histogram until we reach the point where we choose the type of graph. There are two options for box plots. 1 Second row, first graph shows outliers (we will get to those soon) Using Technology The box plot is another that we can construct using the TI-series graphing calculator. We do everything the same as when constructing a histogram until we reach the point where we choose the type of graph. There are two options for box plots. 1 Second row, first graph shows outliers (we will get to those soon) 2 Second row, second graph does not show outliers Using Technology The box plot is another that we can construct using the TI-series graphing calculator. We do everything the same as when constructing a histogram until we reach the point where we choose the type of graph. There are two options for box plots. 1 Second row, first graph shows outliers (we will get to those soon) 2 Second row, second graph does not show outliers We again use ZOOM and 9 to produce the graph. Getting Statistics We can also find the statistics we need using the calculator relatively easily. 1 Input the data in the usual way Getting Statistics We can also find the statistics we need using the calculator relatively easily. 1 Input the data in the usual way 2 Press 2nd and MODE to quit to a blank screen Getting Statistics We can also find the statistics we need using the calculator relatively easily. 1 Input the data in the usual way 2 Press 2nd and MODE to quit to a blank screen 3 Press STAT , scroll to CALC, and select 1-Var Stats Getting Statistics We can also find the statistics we need using the calculator relatively easily. 1 Input the data in the usual way 2 Press 2nd and MODE to quit to a blank screen 3 Press STAT , scroll to CALC, and select 1-Var Stats 4 You will see 1-Var Stats on the screen; now select which list the data is in by pressing 2nd and then the appropriate number 1-6, followed by the ENTER key Getting Statistics We can also find the statistics we need using the calculator relatively easily. 1 Input the data in the usual way 2 Press 2nd and MODE to quit to a blank screen 3 Press STAT , scroll to CALC, and select 1-Var Stats 4 You will see 1-Var Stats on the screen; now select which list the data is in by pressing 2nd and then the appropriate number 1-6, followed by the ENTER key On this screen are some statistics we need x and Sx and if we scroll down, we will see the 5-number summary. The Geometric View From the minimum to Q1 is the bottom 25% of the observations The Geometric View From the minimum to Q1 is the bottom 25% of the observations From Q1 to Q3 is the middle 50% of the observations The Geometric View From the minimum to Q1 is the bottom 25% of the observations From Q1 to Q3 is the middle 50% of the observations From Q3 to the maximum of the top 25% of the observations The Geometric View From the minimum to Q1 is the bottom 25% of the observations From Q1 to Q3 is the middle 50% of the observations From Q3 to the maximum of the top 25% of the observations We can look at this in other ways too: The top half lies above the median The Geometric View From the minimum to Q1 is the bottom 25% of the observations From Q1 to Q3 is the middle 50% of the observations From Q3 to the maximum of the top 25% of the observations We can look at this in other ways too: The top half lies above the median The top 75% lies above Q1 The Geometric View From the minimum to Q1 is the bottom 25% of the observations From Q1 to Q3 is the middle 50% of the observations From Q3 to the maximum of the top 25% of the observations We can look at this in other ways too: The top half lies above the median The top 75% lies above Q1 The bottom 75% lies below Q3 Box-and-Whisker Plot Example Example Construct a box-and-whisker plot for the data from the first example. minimum Q1 Median Q3 maximum 4 30 33.5 52.5 110 Solution Some Data Set Values 125 100 75 50 25 Set 1 Analysis of Box-and-Whisker Plots We can also look at the distribution like we did with histograms, but in a limited way as we cannot really tell how many peaks. But we can look at the spread and center (directly from the table) and we can look at the skewness. Analysis of Box-and-Whisker Plots We can also look at the distribution like we did with histograms, but in a limited way as we cannot really tell how many peaks. But we can look at the spread and center (directly from the table) and we can look at the skewness. What is the range here? Analysis of Box-and-Whisker Plots We can also look at the distribution like we did with histograms, but in a limited way as we cannot really tell how many peaks. But we can look at the spread and center (directly from the table) and we can look at the skewness. What is the range here? 106 Analysis of Box-and-Whisker Plots We can also look at the distribution like we did with histograms, but in a limited way as we cannot really tell how many peaks. But we can look at the spread and center (directly from the table) and we can look at the skewness. What is the range here? 106 What do we know about the distribution? Analysis of Box-and-Whisker Plots We can also look at the distribution like we did with histograms, but in a limited way as we cannot really tell how many peaks. But we can look at the spread and center (directly from the table) and we can look at the skewness. What is the range here? 106 What do we know about the distribution? Skewed right distribution. Analysis of Box-and-Whisker Plots We can also look at the distribution like we did with histograms, but in a limited way as we cannot really tell how many peaks. But we can look at the spread and center (directly from the table) and we can look at the skewness. What is the range here? 106 What do we know about the distribution? Skewed right distribution. Further, this right endpoint seems to be pretty far away, so we may think it is an outlier. But how do we determine if it is analytically? IQR Criterion Definition The IQR Criterion is an analytic way for us to determine if data points are outliers based on a 5-number summary. To determine outliers, we use Q1 − 1.5IQR and Q3 + 1.5IQR to give us endpoints of the acceptable data range, where IQR is the Interquartile Range and IQR = Q3 − Q1 IQR Criterion Definition The IQR Criterion is an analytic way for us to determine if data points are outliers based on a 5-number summary. To determine outliers, we use Q1 − 1.5IQR and Q3 + 1.5IQR to give us endpoints of the acceptable data range, where IQR is the Interquartile Range and IQR = Q3 − Q1 These new endpoints are sometimes referred to as fences. Using the IQR Criterion So, basically what we are doing is saying that any values no further away from the middle 50% than 1.5 times the range of the middle 50% are acceptable. Anything outside that range is an outlier. Using the IQR Criterion So, basically what we are doing is saying that any values no further away from the middle 50% than 1.5 times the range of the middle 50% are acceptable. Anything outside that range is an outlier. Example Are there any outliers in the previous data set? Using the IQR Criterion So, basically what we are doing is saying that any values no further away from the middle 50% than 1.5 times the range of the middle 50% are acceptable. Anything outside that range is an outlier. Example Are there any outliers in the previous data set? First we find the IQR, which is Q3 − Q1 = 52.5 − 30 = 22.5 and then we consider the new endpoints (fences). Using the IQR Criterion So, basically what we are doing is saying that any values no further away from the middle 50% than 1.5 times the range of the middle 50% are acceptable. Anything outside that range is an outlier. Example Are there any outliers in the previous data set? First we find the IQR, which is Q3 − Q1 = 52.5 − 30 = 22.5 and then we consider the new endpoints (fences). Q1 − 1.5IQR = 30 − 1.5(22.5) = 30 − 33.75 = −3.75 Q3 + 1.5IQR = 52.5 + 1.5(22.5) = 52.5 + 33.75 = 86.25 Since 110 is larger than this upper threshhold, we would say it is an outlier. Standard Deviation The standard deviation measures the variation in data by measuring the distance that the observations are from the mean. The standard deviation tells us how far we can expect the average observation to be from the mean. Standard Deviation The standard deviation measures the variation in data by measuring the distance that the observations are from the mean. The standard deviation tells us how far we can expect the average observation to be from the mean. Absolute Deviation n 1X |xi − x| n i=1 Standard Deviation The standard deviation measures the variation in data by measuring the distance that the observations are from the mean. The standard deviation tells us how far we can expect the average observation to be from the mean. Absolute Deviation n 1X |xi − x| n i=1 Standard Deviation sP s= (x − xi )2 n−1 Standard Deviation and Variance Standard Deviation sP s= (xi − x)2 n−1 Whereas it won’t have a lot of use for our purposes Variance 2 s = P (xi − x)2 n−1 Finding the Standard Deviation Example Find the standard deviation of the daily caloric intake for a person over the course of a week. {1792, 1666, 1362, 1614, 1460, 1867, 1439} Finding the Standard Deviation Example Find the standard deviation of the daily caloric intake for a person over the course of a week. {1792, 1666, 1362, 1614, 1460, 1867, 1439} First we find the mean. x= 11200 = 1600 7 Finding the Standard Deviation Then, we need to find the difference between each of these values and the mean, then square that differences and then sum them. xi 1792 (xi − x)2 (1792 − 1600)2 square 1922 contribution 36864 Finding the Standard Deviation Then, we need to find the difference between each of these values and the mean, then square that differences and then sum them. xi 1792 1666 1362 1614 1460 1867 1439 (xi − x)2 (1792 − 1600)2 (1666 − 1600)2 (1362 − 1600)2 (1614 − 1600)2 (1460 − 1600)2 (1867 − 1600)2 (1439 − 1600)2 square 1922 662 (−238)2 142 (−140)2 2672 (−161)2 contribution 36864 4356 56644 196 19600 71289 25921 Finding the Standard Deviation Then, we need to find the difference between each of these values and the mean, then square that differences and then sum them. xi 1792 1666 1362 1614 1460 1867 1439 (xi − x)2 (1792 − 1600)2 (1666 − 1600)2 (1362 − 1600)2 (1614 − 1600)2 (1460 − 1600)2 (1867 − 1600)2 (1439 − 1600)2 square 1922 662 (−238)2 142 (−140)2 2672 (−161)2 sum contribution 36864 4356 56644 196 19600 71289 25921 214870 Finding the Standard Deviation Next, we divide by 6. s2 = s2 is the ... 214870 ≈ 35811.67 6 Finding the Standard Deviation Next, we divide by 6. s2 = s2 is the ...variance. 214870 ≈ 35811.67 6 Finding the Standard Deviation Next, we divide by 6. s2 = 214870 ≈ 35811.67 6 s2 is the ...variance. Now we take the square root. √ s = 35811.67 ≈ 189.24 So, the average value of the caloric intake is approximately 189 calories from the mean. Notice that we only care about magnitude and not whether we are above or below the mean. Mean and Standard Deviation So what can we do with mean and standard deviation? Mean and Standard Deviation So what can we do with mean and standard deviation? We can use them to relate individuals within our data set to the distribution of the sample. Mean and Standard Deviation So what can we do with mean and standard deviation? We can use them to relate individuals within our data set to the distribution of the sample. This is related to probability. Mean and Standard Deviation So what can we do with mean and standard deviation? We can use them to relate individuals within our data set to the distribution of the sample. This is related to probability. The total area underneath a distribution curve is always 1, so the area under the curve is the same as the percent of observations falling in the region. We will see this better when we get to Normal distributions. Uniform Distributions For these, all values have the same probability of occurring. So, the shape is that of a rectangle. Random Number Between 0 and 2 f (x) 1 2 1 x 2 Uniform Distributions Example If we have a uniform distribution for a random number to be chosen between 0 and 2, what is the probability that the number selected is between .5 and 1.1? Uniform Distributions Example If we have a uniform distribution for a random number to be chosen between 0 and 2, what is the probability that the number selected is between .5 and 1.1? Random Number Between 0 and 2 f (x) 1 2 1 2 Uniform Distributions Random Number Between 0 and 2 f (x) 1 2 1 x 2 Uniform Distributions Random Number Between 0 and 2 f (x) 1 2 1 x What is the area of a rectangle? 2 Uniform Distributions Random Number Between 0 and 2 f (x) 1 2 1 x What is the area of a rectangle? length × width. 2 Uniform Distributions Random Number Between 0 and 2 f (x) 1 2 1 x What is the area of a rectangle? length × width. What are our dimensions? 2 Uniform Distributions Random Number Between 0 and 2 f (x) 1 2 1 x What is the area of a rectangle? length × width. What are our dimensions? .5 × .6 = .3. 2 Uniform Distributions Random Number Between 0 and 2 f (x) 1 2 1 x 2 What is the area of a rectangle? length × width. What are our dimensions? .5 × .6 = .3. So, there is a 30% chance that the number randomly selected falls in this region.