Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MAT 142 Ch15 Note Park 15.1 Graphs and Charts I. Representing Categorical or Discrete Data The following data set consists of m&m colors in a bag. RED BLUE YELLOW BLUE RED GREEN BLUE BROWN RED YELLOW RED RED RED RED BLUE GREEN BLUE YELLOW BROWN RED BLUE RED BLUE YELLOW YELLOW BLUE BROWN YELLOW BROWN RED A. Frequency & Relative Frequency Table Color Frequency Relative Frequency RED BLUE BROWN GREEN YELLOW Which color do you have the most of? What is the relative frequency of that color? B. Bar Graph *Note: The bars do not touch. The bars are centered on the category or discrete numerical value. MAT 142 Ch15 Note Park C. Dot Plot (Line Plot) D. Circle Graph To make a circle graph: 1) Know the fraction of the circular region allocated to each value of the variable 2) Because there are 360° in a circle, we can draw a sector with an angle size of 360 x fraction of the value. *Circle graphs often, though not always, represent percents of the total rather than frequencies. E. Pictograph *Note: equally spaced intervals and key assigning weight to a picture MAT 142 Ch15 Note Park Potential dangers of pictographs…..What is wrong with each of the following pictographs? How could we “fix” them? II. Representing Numerical Data Record the height (in inches) of your classmates below: A. Stem-and-leaf plot Stem (tens-digit) Leaf (ones-digit) *Note: order leaves from smallest to largest to make an ordered stem-and-leaf B. Histogram While there is some flexibility involved in determining the details of a histogram, some general guidelines should be followed: - Use a reasonable number of bins (usually 4 to 7 bins is recommended) - Every data value must fall into exactly one bin, which implies - bin boundaries must be defined precisely - your bins must cover the entire range of your data set - Bins must have equal widths - The drawing of a histogram (bar graph) can have no gaps between bins Let’s construct a histogram based on the data set we got! MAT 142 Ch15 Note Park 1. Order the data! 2. Range 3. Width 4. Class-count table 5. Histogram MAT 142 Ch15 Note Park EX1 The following histogram represents fuel economy of several cars sold in the United States and Europe. What descriptive properties of this data set does this histogram make clear? Certain properties of data sets are described with specific terminology. DEF1 An _________________________ is a data value that does not follow the overall pattern of the data set. A data set which contains a high outlier – resulting in a histogram which will have a “tail” on the right – is called _____________________________________________. A data set which contains a low outlier – resulting in a histogram which will have a “tail” on the left – is called _____________________________________________. Histograms and other graphical representations of data allow us to see certain properties of an entire data set. In some cases, though, we prefer precise and concise numerical summaries – one or two numerical values which capture the essential properties of a data set. MAT 142 Ch15 Note Park Two essential properties we will focus on are the ____________________________ and __________________________________ of a data set. The center of a data set can be thought of as its “average” – the one number which is the best representation of all of the numbers in the data set. Two common measures of the center of a data set are the ____________________________ and the ___________________________________. EX 2 Find the mean and median of the values 2, 13, 16, 4, 4, 19, 22, 19. EX 3 Find the mean and median of the following data set, which represents patient wait times at an emergency room, measured in minutes. 11 16 18 21 24 24 25 29 31 33 38 41 42 91 94 For the above data set, is the mean or the median a better representation of the center of the set? Why? Note that the ________________________is strongly affected by __________________________, while the __________________ is not. Specifically, the mean is __________________________________ ________________________________ outliers. Thus, comparing the values of the mean and median can be an indication of skewness. MAT 142 Ch15 Note Park EX 4 State the mean and median of each of the three data sets below. A: 75, 75, 75, 75, 75 B: 65, 70, 75, 80, 85 C: 45, 60, 75, 90, 105 The data sets from EX 4 demonstrate that describing a data set by simply stating its center doesn’t always give a complete picture of the properties of the set. In addition to describing the center of a data set, it is often helpful to also measure its __________________________, that is, the extent to which the data varies. Two common measures of spread we will focus on are standard deviation and the five-number summary. The purpose of calculating the standard deviation of a data set is find how far the data values are from the mean “on average.” This calculation entails several steps, but throughout the process, keeping in mind this purpose will help clarify the steps required. EX 5 Calculate the standard deviation of the following data set. 1.8 2.2 2.4 2.4 3.4 3.4 4 Since standard deviation measures how far data are from the mean, the first step will be to find the mean. MAT 142 Ch15 Note Park EX 5 (Continued) The next several steps can be efficiently organized in the following table: Data values Subtract the Square mean Once we have completed the table, there are still steps to finish. If the data values 2.8, 2.8, 2.9, and 2.9 were added to our data set, would we expect the new set to have a higher or lower standard deviation than we just calculated? Why? MAT 142 Ch15 Note Park EX 6 Calculate the standard deviation of the following data set. 40 42 Data values EX 7 43 43 48 48 49 51 Subtract the Square mean Without performing any calculations, identify which of the following data sets would have the largest standard deviation. Which would have the lowest? A: 34, 38, 41, 46 B: 1.2, 16.75, 17.2, 34.6, 48.2 C: 116, 119, 119, 120, 121, 122, 122, 122, 123 In earlier examples, we noted that if our data set has ___________________________, we prefer to use the ____________________________ to describe the center of the data set. In such cases, the standard deviation – which is based on the mean of the data set – may be an ineffective measure of the spread of the data. Thus, in some circumstances – such as when our data set contains outliers – we prefer to describe the spread of the data using the five-number summary. MAT 142 Ch15 Note Park As its name suggests, the five-number summary is a list of five key numbers which are equally spaced throughout the data set. By identifying these key numbers, we can gauge the variation of the data across the range of the entire set – approximately 25% of the data values in a set fall between the numbers in the five-number summary. DEF 2 The five-number summary consists of the following five numerical values derived from the data set, listed in increasing order: Minimum :: Lower Quartile :: Median :: Upper Quartile :: Maximum EX 8 The lower quartile of a data set (also called the first quartile, denoted ( which fall below the overall median of the entire data set. ) is the median of the data values The upper quartile of a data set (also called the third quartile, denoted ( which fall above the overall median of the entire data set. ) is the median of the data values Consider the following stem-and-leaf plot based on the first test in one instructor’s MAT 142 course: 3 4 5 6 7 8 9 9 0 0 1 0 0 0 0 0 3 0 0 1 8 1 3 2 1 1 8 1 3 3 1 9 2 5 3 2 2 6 9 2 3 8 9 8 8 8 9 8 9 8 9 9 State the five-number summary for this data set. If the instructor is feeling generous and decides to give the top 25% of students an A, what test score becomes the cut-off for an A? EX 9 Soil ecologists measured levels of a specific nutrient in 28 samples collected from two different sites. The data from the two sites is represented by the following five-number summaries. Site A: 2.6 3.2 3.8 4.2 4.8 Site B: .01 3.7 3.9 4.3 4.8 Label the following descriptions with A or B, based on which site best fits the property described. ____ has an outlier ____ more symmetric distribution ____ has the larger median ____ would have the larger standard deviation ____ has the larger third quartile ____ has the larger mean A convenient visual representation of a five-number summary is a boxplot. MAT 142 Ch15 Note Park EX 10 On the axes below, construct two separate boxplots for the two five-number summaries given in EX 9. A B 0 0.8 1.6 2.4 3.2 4.0 4.8 Based on the above boxplots, what percentage of samples from site A had a nutrient level between 3.2 and 4.2? _______ How many samples from site B had a nutrient level higher than 4.3? _______ How many samples from site B had a nutrient level between .01 and 3.7? _______ Within what range of nutrient levels did the middle 50% of the samples from site B fall? ____________________