Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and Statistics – Mrs. Leahy 1 Unit 2: Averages and Variation Day 1: Mode, Median, Mean, Trimmed Mean, Distribution of Averages “While the individual man is an insolvable puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to.” -- Arthur Conan Doyle, The Sign of Four Measures of Central Tendency: Mode, Median, and Mean What does it mean to be average? The average price of gold is $920 per ounce. My car averages 28 miles per gallon. The average shoe size for women is a size 8. The average test score was an 85%. An average is a way to describe an ______________ data set using only _______________________. Sometimes we will be working with a population and sometimes with a sample. A population is ______ possible cases in the situation we are studying. A sample is a ________ set of cases from the population use to represent the entire population. Example: Population: Every carton of orange juice manufactured by Tropicana this year. Sample: Ten cartons of Tropicana orange juice randomly selected from a grocery store shelf. Most Commonly Used “Averages”: Mode, Median, and Mean MODE The mode of a set of data is the number that occurs ___________ frequently. It is possible to have more than one mode. It is possible to have no mode. Example 1: Find the mode of this data: {1, 7, 8, 4, 4, 4, 6, 3, 8, 7} Example 2: Sixteen students are asked how many college math classes they have completed. There responses are shown at the right. What is the mode? Probability and Statistics – Mrs. Leahy 2 MEDIAN The median of a set of data is the number that is exactly in the _______________ of a set of ordered values. Sometimes the median is called the “central value.” To find the median: 1. Order the data from smallest to largest 2. For an odd number of data values: Mean = Middle value 3. For an even number of data values: Mean = “average” of two middle values (Sum two middle values, then ÷ 2) Example 3a: Find the median of this data: {10, 5, 1, 6, 10, 2, 5, 7, 3} Example 3b: Find the median of this data: {1, 7, 8, 4, 4, 4, 6, 3, 8, 7} Example 4: Belleview College must make a report to the budget committee about the average credit hour load a fulltime student carries. A 12-credit-hour load is the minimum requirement for full-time status. For the same tuition, students may take up to 20 credit hours. A random sample of 40 students yielded the following information (in credit hours) 17 12 15 18 a) Organize the data from smallest to largest number of credit hours 12 17 12 14 14 16 15 16 17 15 16 17 13 14 12 15 16 12 18 19 18 12 20 12 20 13 19 13 13 17 12 12 b) What is the mode? c) What is the median? d) If the budget committee is going to fund the school based on the average student credit hour load, which of these two averages should be reported to the committee? 12 14 15 15 Probability and Statistics – Mrs. Leahy 3 Mean and Trimmed Mean Mean The mean of a set of data is the arithmetic ________________ of all data values. Symbols: ∑ = Summation, “the sum of the following” n = the number of values in the sample N = the number of values in the population Mean of a sample: x x n “x-bar” Mean of a population: x N “mew” Example 5: Find the mean of the following data set: {3, 8, 5, 4, 8, 4, 10 } Example 6: You are asked to answer the following question: “How many hours of television do most people watch each week?” You conduct a survey where participants are asked “How many hours of television did you watch last week?” A random sample of the results are displayed below: 1 3 5 5 5 7 7 10 10 42 What should you say? Would your answer change if you “trimmed off” the highest and lowest answers? Probability and Statistics – Mrs. Leahy 4 Example 7: To graduate, Steve must have at least a C in history. He did fairly well on the first four tests; however, he failed the last one. Here are his scores: 73 80 69 72 35 a) Find the mean score and determine if Steve will get a C or better (70%+). b) c) What is the median? What if Steve’s school allows him to drop the highest and lowest scores? That is, what if he is allowed to trim off 20% of his scores… What is the mean of his remaining scores? Does trimming change the median? d) Which mean is closer to the median? Trimmed Mean: The mean of the data values left after “trimming” a specified percentage of the smallest and largest data values from a data set. Common trims are 5% or 10%. Example 8: A sample of 20 colleges showed class sizes for intro courses to be: a) What is the mean for the entire sample. c) Which mean is closer to the median? 14 20 20 20 20 23 25 30 30 30 35 35 35 40 40 42 50 50 80 80 b) Compute a 5% trimmed mean for the sample. Probability and Statistics – Mrs. Leahy 5 Distributions of Averages Mean = Median = Mode (Or very nearly) Mean < Median < Mode Mean>Median>Mode Weighted Averages (Weighted Mean) In this class you were told the following information about grade weighting: Homework 17% Tests 68% Final Exam 15% You currently have a 100% on homework and think you got an 85% on the Unit 1 test. What is your current grade in the class? This type of mean is called a “weighted average” or “weighted mean” because some values are considered more important than others. Probability and Statistics – Mrs. Leahy Example 9: Suppose your midterm test score is 80, your project score is 85 and your final exam score is 98. Suppose the weights are 30% for midterm, 30% for projects, and 40% for final. If the minimum average of an A is 90%, will you earn an A? 6 Example 10: Suppose you enter a comedy script writing contest and are given a score of 1 to 10 (10 being the best) in categories of humor, originality, and presentation. Humor is given a weight of 5, originality is given a weight of 3, and presentation is given a weight of 2. You receive an 8 in humor, a 5 in originality, and a 7 in presentation. What is your final average score? Weighted GROUPED Data (by classes) Example 10: a) b) Find the weighted (group) average for the following data. For a group of data: Use x = midpoint (median) of class Use w = frequency of class Probability and Statistics – Mrs. Leahy 7 Day 2: Range, Variance, Standard Deviation Any set of measurements has two important properties: The central/typical/average value (Day 1) The SPREAD about that value. Spread tells us how far from the center the data ranges. Example: You survey 2 groups of 50 students asking them to report their weight. Group 1: Mean weight 145lbs Group 2: Mean weight 145lbs. Today, we will be talking about standard deviation, a measurement of spread about the _______________. On day 4, we will be talking about interquartile range, a measurement of spread about the ________________. Measures of Variation range = largest value – smallest value Range Example 1: A large bakery regularly orders cartons of Maine blueberries. The average weight of the cartons is supposed to be 22 ounces. Random samples of cartons from two suppliers were weighed. The weights in ounces per cartons were: Supplier I: 17 22 22 22 27 a) What is the range of each set of data? Supplier II: 17 19 20 27 b) What is the mean of each set of data? c) The bakery uses 1 carton of blueberries per blueberry muffin recipe. Which supplier should they choose? 27 Probability and Statistics – Mrs. Leahy 8 Range unfortunately does not tell us how much the other values vary from one another or from the mean. Variance and Standard Deviation of a Sample (Ungrouped Data --- no classes….) Symbols/Explanations: DEFINING FORMULAS 2 ∑(𝑥−𝑥̅ )2 Sample Variance: 𝑠 = Sample Standard Deviation: 𝑠 = √𝑠 2 𝑛−1 𝑥 = a data value or outcome 𝑛 = the sample size 𝑥̅ = the mean of a sample 𝑥̅ − 𝑥 = the difference between what you expected to happen and what actually happened; “the deviation” (from example 1) Supplier I: 17 Supplier II: 17 22 19 22 20 22 27 27 27 ∑(𝑥 − 𝑥̅ )2 = the sum of squares Example 2: Use the data from Example 1 to find the sample variance (s2) and sample standard deviation (s) Supplier I: 𝑥 𝑠2 = 𝑠= 𝑥̅ = Supplier II: (𝑥 − 𝑥̅ )2 𝑥 − 𝑥̅ ∑(𝑥−𝑥̅ )2 𝑛−1 = 𝑥̅ = 𝑥 𝑠2 = 𝑠= 𝑥 − 𝑥̅ ∑(𝑥−𝑥̅ )2 𝑛−1 (𝑥 − 𝑥̅ )2 = Probability and Statistics – Mrs. Leahy 9 Step 1: Compute the mean 𝑥̅ Step 2: List out your data values (the x’s) Step 3: Find how far off each data value is from the mean: 𝑥 − 𝑥̅ Step 4: Square this difference: (𝑥 − 𝑥̅ )2 Step 5: Find the sum of the squares: ∑(𝑥 − 𝑥̅ )2 Step 6: Substitute this value into your formulas Example 3: Big Blossom Greenhouse measured a sample of rose blooms for diameters in inches. 2 3 3 8 10 10 Compute the sample variance and the sample standard deviation. Variance and Standard Deviation of a POPULATION In most statistics applications, we work with a random sample of data rather than the entire population. If you have the data for a population, you can determine the population mean, population variance, and population standard deviation. ∑(𝑥−𝑥̅ )2 Sample Variance: 𝑠2 = Sample Standard Deviation: 𝑠 = √𝑠 2 𝑛−1 N = size of the population 𝜇 = the mean of a population Population Mean = 𝜇 = ∑𝑥 𝑁 Population Variance = 𝜎 2 = ∑(𝑥−𝜇)2 𝑁 Population Standard Deviation = 𝜎 = √𝜎 2 Example 4: For the population of five values {1, 4, 4, 3, 5}, find the population variance and population standard deviation. 𝑥 𝑥−𝜇 (𝑥 − 𝜇)2 Probability and Statistics – Mrs. Leahy 10 Day 3: Variance, Standard Deviation, Grouped/Class Data and don’t you wish there was an easier formula to use…. The formulas we used yesterday were called “defining formulas.” You can get the same answers for sample variance and sample standard deviation by using the “computational” formulas. COMPUTATIONAL FORMULAS Example 1: For the sample {1, 3, 2, 6}, find the standard deviation, s. Sample Variance (Computational) 𝑠2 = ∑ 𝑥 2 − (∑ 𝑥)2 /𝑛 𝑛−1 Sample Standard Deviation (Computational) 𝑠 = √𝑠 2 Benefits: Don’t have to find the mean first. Don’t have to find the difference between mean and data value. Less steps! Example 2: For the sample below, find the sample variance and the sample standard deviation using the computational formulas. A study examining the health risks of smoking measured cholesterol levels of people who had smoked for at least 25 years. Probability and Statistics – Mrs. Leahy 11 Example 3: American League baseball teams play their games with the designator hitter rule, meaning that pitchers do not bat. The League believes that replacing the pitcher, typically a weak hitter, with another player in the batting order produces more runs and generates more interest among fans. Following are the average number of runs scored in the American League and National League stadiums for the first half of the 2001 season. Find the mean, sample variation, and sample standard deviation for each League’s set of data. Probability and Statistics – Mrs. Leahy 12 Standard Deviation for Grouped Data (Classes) To find the standard deviation for a set of grouped (class) data -----------------> Example 4: Find the sample standard deviation for the following grouped data. Use s x 2 f xf / n 2 n 1 Sample mean for grouped data: X xf n X = midpoint of a class n = sum of the frequencies Sample standard deviation for grouped data: s X X n 1 2 f x 2 f xf / n 2 n 1 (a weighted ave.) Probability and Statistics – Mrs. Leahy 13 Example 5: Find the sample standard deviation for the following grouped data. Coefficient of Variation The Coefficient of Variation expresses standard deviation as a percentage of the sample or population mean. This allows us to compare data from different populations that may use different units of measurement. CV = standard deviation ÷ mean x 100 Example 6: Mrs. Leahy and Mrs. Whitham decide to compare the heights of the students in their classes. Mrs. Leahy’s class had a mean height of 67 inches, with a standard deviation of 2.13 inches. Mrs. Whitham’s class had a average height of 165cm with a standard deviation of 5cm. Use the coefficient of variation to compare the two classes. Probability and Statistics – Mrs. Leahy 14 Chebyshev’s Theorem Data within: 2 standard deviations of the mean 𝑥̅ ± 2𝑠 3 standard deviations of the mean 𝑥̅ ± 3𝑠 4 standard deviations of the mean 𝑥̅ ± 4𝑠 . Example 8: For a sample with mean 𝑥̅ = 5 and a standard deviation s = 1.5 a) Find an interval A to B such that at least 75% of the data will lie within this interval. b) Find an interval A to B such that at least 88.9% of the data fall within this interval. Outliers: Occur at _________ standard deviations from the mean. Probability and Statistics – Mrs. Leahy 15 Day 4: Percentiles, 5-Number Summaries: I took my son to the doctor and was told that he was in the “85th percentile for height and the 56th percentile for weight”. You took a standardized test and received notice that you scored in the “90th percentile. On the website for the college you want to attend, you see they are accepting applications from students in the “75th percentile of their graduating class.” WHAT DOES THIS MEAN? Percentiles There are 100 percentiles. If P = the Pth Percentile then P% of the data is _____to P and (100 – P)% of the data is _____ to P. Example 1: You took the English achievement test to obtain college credit in freshman English by examination. a) If your score is at the 89th percentile, what percentage of scores are at or below yours? b) What percentage of scores are higher than yours? c) If the scores ranged from 1 to 100 and your raw score is 95, does this necessarily mean that your score is at the 95th percentile? Quartiles/Interquartile Range Quartiles divide the data into ____________. Q1 first quartile = _______ percentile. Q2 second quartile = _______ percentile and is also the ___________ of the data Q3 third quartile = ________ percentile Probability and Statistics – Mrs. Leahy 16 The Interquartile Range (IQR) is the difference between Q3 and Q1. The IQR you the range of values in the middle ______ of the data set. The Interquartile Range is a measure of SPREAD about the MEDIAN of a set of data. Example: Data Set 1: Median = 6, IQR = 4 Data Set 2: Median = 6, IQR = 10 Example 2: For the sample {10,14,11,19,15,21,21,16,20} Example 3: For the sample a) Find the Quartiles b) Find the Interquartile Range a) Find the Quartiles b) Find the Interquartile Range. {42,77,19,53,95,34,94,86} Probability and Statistics – Mrs. Leahy 17 Box-and-Whisker Plots Smallest Value Largest Value Quartiles: Q1, Q2, Q3 give us a very useful _________________ summary of the data and their spread A box-and-whisker plot is a graphical representation of these values: Outliers occur when any value is beyond 1.5 x IQR PROCEDURE: How to make a box-and-whisker plot Steps 1, 2, 3: 1. Draw a horizontal (or vertical) scale to include the lowest and highest values. 2. Above (or to the right) of the scale, draw a box from Q1 to Q3. 3. Include a solid line through the box at the median level. Step 4: 4. Check for outlier and draw them in as individual points. 5. Draw horizontal (or vertical) lines (called whiskers), from Q1 to the lowest value and from Q3 to the highest value. Step 5: Probability and Statistics – Mrs. Leahy 18 Example 4: For the sample {42,77,19,53,95,34,94,86} a) Find the five-number summary b) Make a box and whisker plot. Example 6: Consider the data {1, 2, 3, 3, 5, 6, 7, 7, 10, 20} a) Find the five-number summary for the data. b) Draw a box and whisker plot. Example 7: Three classes (A, B, C) took the same test Probability and Statistics – Mrs. Leahy 19 Example 8: a) What is the range of the sugar content of these cereals? b) Describe the shape of the distribution of the: histogram: adult box-whisker-plot children box-whisker-plot c) Are all children’s cereal higher in sugar than adult cereals? d) Which group varies more in sugar content? Example 9: