Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Math 167 Chapter 5 Chapter 5 - Exploring Data : Distributions In this chapter, we will learn about different distributions to represent and analyze data. Definitions 1. Individuals are objects described by a set of data. They may be people, animals, or things. 2. Variable is any characteristic of an individual. It can take different values for different individuals. A variable can be categorized as a) Numeric or Quantitative : e.g grades, height, age, or b) Categorical or Qualitative: e.g email, name, gender, color 3. The distribution of a variable tells us what values the variable taken and how often does it take these values. 4. Histogram: is a graph of distribution of outcomes (often divided into classes) for a single numeric value. Height of each bar is the number of observations. All classes (bars) should have the same width and each observation must fall into exactly one class. 5. Outlier: A deviation from the rest of the data. An individual that falls outside the overall pattern. 6. Symmetry of a Distribution: a) Left-skewed : longer tail of the distribution is on the left. b) Right-skewed : longer tail of the distribution is on the right. c) Symmetric : GUESS!! 7. Mean of a Distribution : The average. It is computed as 8. Median: Arrange all observations in increasing order. If the number of observations is odd, the median M is the center observation. If the number of observations is even, the median M is the average of two center observation. 9. Mode: Most frequently occurring value in the set of observations. 10. Range: largest observation – smallest observation. 11. Standard Deviation : average amount that observed data values deviate from the mean. 12. Variance: Square of standard deviation. 13. Quartiles: Arrange the observations in increasing order. a) The first quartile is the median of the lower half b) The third quartile is the median of the upper half c) The second quartile is the median. d) The inter-quartile range is the difference between third and first quartile. 14. Normal Distribution is determined by the mean and standard deviation. Fun Facts: a) Mean = Median = Mode b) The first quartile is located about 0.67 standard deviation below the mean; the third quartile is located about 0.67 standard deviation above the mean. c) The 68 – 95 – 99.7 rule states that i) About 68% of the observations fall within 1 SD of the mean. ii) About 95% of the observations fall within 2 SD of the mean. iii) About 99.7% of the observations fall within 3 SD of the mean. d) The curve is known as a bell curve and it never touches the x – axis. 1 Math 167 Chapter 5 Normal distribution is an approximation of Histogram Comparison of Symmetry How does a change in mean and/or standard deviation affect the normal curve? 2 Math 167 Chapter 5 Example 1. Given a list of number of dogs owned by families in a particular neighborhood. Draw a histogram representing the same. 0, 1, 1, 2, 3, 5, 8 a) Draw a histogram representing the same. b) Find the mean, mode, and median of the data. c) What is the Range for this data? d) What is the first, second, third quartiles? What is the inter-quartile range? e) Find the standard deviation. 3 Math 167 Chapter 5 Example 2. Given a set of observations. Find the missing observation so that the median is 5. 8, 5, 10, 3, ? Example 3. Below are the ages of 15 students in a college class 27, 50, 33, 25, 86, 25, 85, 31, 37, 44, 20, 36, 59, 34, 28 a) What is the mean age? b) What is the median age? c) What is the mode? d) What is the inter-quartile range? e) Can you identify any possible outliers? f) What is the range for this data? 4 Math 167 Chapter 5 Example 4. The following data is the grades of students of a class. Divide the graded into reasonable classes, and make a histogram with those classes. 86, 86, 85, 83, 83, 82, 82, 81, 81, 80, 79, 77, 77, 77, 76, 76, 75, 74, 74, 73, 72, 72, 72, 69, 69, 69, 67, 65, 61, 58, 51 Example 5. The scores on an honors exam from a class of 17 students are given below: 32, 71, 72, 77, 77, 83, 84, 85, 87, 89, 90, 92, 95, 96, 98, 99, 100 a) Find the mean, mode, and median of the scores. b) What is the minimum score, the maximum score, and the range of the data? c) What are the first and third quartiles? What is the inter-quartile range? 5 Math 167 Chapter 5 Example 6.The scores of students on a standardized test form a normal distribution with mean 300 and standard deviation 40. If 2000 students took the test, find the number of students who score above 380. Example 7. The length of tape on a roll of a certain type of tape is normally distributed with a mean of 25 meters and a standard deviation of 50 centimeters. a) What is the range of lengths that included 99.7% of the rolls? b) What percentage of rolls are longer than 26 meters? c) What lengths bracket the middle 50% of the rolls of tape? Example 8. Heights of women is distributed approximately normal with a mean of 64.5 and a standard deviation of 2.5 inches. a) What percentage of women lie between the heights 59.5 and 69.5? b) What is the height range for the middle 50% of the women? 6 Math 167 Chapter 5 The Five-Number Summary and Boxplots The five-number summary of distribution consists of five numbers (SURPRISE!!!) Minimum, Q1, M, Q3, Maximum A boxplot is a graph of the five-number summary. Example 9.. Use the data from Problem #1 and sketch the corresponding boxplot. 27, 50, 33, 25, 86, 25, 85, 31, 37, 44, 20, 36, 59, 34, 28 Example 10. Use the data from Problem #5 and sketch the corresponding boxplot. 32, 71, 72, 77, 77, 83, 84, 85, 87, 89, 90, 92, 95, 96, 98, 99, 100 7 Math 167 Chapter 5 Example 11.. Display the following data in a Stemplot 32, 71, 72, 77, 77, 83, 84, 85, 87, 89, 90, 92, 95, 96, 98, 99, 100 8