Download Part A

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Section 11 – Quantitative Data in One Variable The following Mathematics Florida Standards will be covered in this section: MAFS.912.S-­‐‑ID.1.1 Represent data with plots on the real number line (dot plots, histograms, and box plots). MAFS.912.S-­‐‑ID.1.2 Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (interquartile range, standard deviation) of two or more different data sets. MAFS.912.S-­‐‑ID.1.3 Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers). Videos in this Section Video 1: Dot Plots and Histograms Video 2: Box Plots Video 3: Measures of Center and Shapes of Distributions Video 4: Measuring Spread Video 5: Outliers in Data Sets Video 6: Comparing Distributions www.AlgebraNation.com 1 Section 11 – Video 1 Dot Plots and Histograms Statistics is the science of collecting, organizing, and analyzing data. Two major classifications of data: Ø Quantitative (_______________-­‐‑ based on measurements) Ø Categorical (_____________-­‐‑ based on “qualities” such as color, taste, or texture, rather than measurements). Identify whether the following variables are quantitative or categorical. _______________________________ Age _______________________________ Favorite subject _______________________________ Area code _______________________________ Distance a football is thrown www.AlgebraNation.com 2 Let’s look at a few types of data displays for one variable quantitative data using the following sample: 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 7 Ø Dot Plot -­‐‑ Each value is represented by a dot above a number line. Ø Histogram – Data frequency is measured over intervals. Ø Box Plot – Graphically displays a summary of the data. www.AlgebraNation.com 3 Dot Plot Ø Each data value is represented with a dot above the number line. Ø Shows ______________________of data values. Ø Dot plots are often used with smaller sets of data. Ø Always include the title and an appropriate scale on the number line for the dot plot. What is frequency? Let’s Practice! The time, rounded to the nearest hour, that 26 students spent on their smartphones on a given day was recorded as follows: 0, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 12 Create a dot plot of the data above. www.AlgebraNation.com 4 Histogram Ø A bar style display showing frequency of data over _________________, rather than displaying each individual data value. Ø Each interval length must be the same. Ø Histograms are often used for larger sets of data. Ø Always title the graph and label both axes. Ø Choose the appropriate scale on the 𝑦-­‐‑axis and the appropriate intervals on the 𝑥-­‐‑axis. What is an interval? Recall our smartphone data: 0, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 12 Construct a histogram to represent the data. www.AlgebraNation.com 5 Try It! Students in Mrs. Ferrante’s class were surveyed about the number of pets they have at home. Their responses were recorded below: 0, 4, 2, 2, 3, 4, 8, 1, 0, 1, 2, 2, 3, 0, 3, 1, 1, 2 Construct a dot plot of the data. Would it be beneficial to construct a histogram of the data? What observations can you make about the shape of the distribution? Are there any values that seem not to fit? www.AlgebraNation.com 6 Try It! Mr. Pappal uses the following grading scale: 92 − 100 (A), 83 − 91 (B), 74 − 82 (C), 65 − 73 (D). The grades for his 19 students are as follows: 92, 99, 83, 91, 96, 88, 87, 83, 90, 75, 82, 74, 77, 77, 81, 76, 79, 66, 73 Should Mr. Pappal represent the data with a dot plot or a histogram? Explain your choice. Complete the frequency table below. Grades Frequency 92 − 100 (A) 83 − 91 (B) 74 − 82 (C) 65 − 73 (D) Construct the representation that you chose. www.AlgebraNation.com 7 We often have to decide whether we should use a dot plot or a histogram to represent the data. Select the sets of data where it would be more practical to use a histogram than a dot plot. o Average daily weather temperatures for Orlando over a year o Daily weather temperatures for Orlando over a month o Height of players on a high school football team o Height of high school football players statewide o Time, rounded to the nearest second, needed to run a 100-­‐‑meter race for 25 randomly selected athletes In general, when would it be more practical to use a dot plot? In general, when would it be more practical to use a histogram? www.AlgebraNation.com 8 BEAT THE TEST! 1. The cafeteria at Just Dance Academy offers items at seven different prices. The manager recorded the price each time an item was sold over a period of one week. $2.50 $2.00 $4.50 $3.00 $3.00 $3.50 $4.00 $2.50 $3.00 $4.00 $3.00 $3.00 $2.50 $1.50 $3.00 $2.00 $3.50 $3.50 $3.50 $3.50 $4.00 $4.00 $3.00 $2.50 Part A: Which graphical display would be most appropriate to represent the data? Justify your answer. Part B: Construct the graphical display you chose in Part A. www.AlgebraNation.com 9 2. Last year, the Mighty Rams boy’s basketball team had a great season and won 75% of their 20 games. The total points scored by the team for each of the 20 games are listed below in the order the games were played: 76, 55, 76, 64, 46, 91, 65, 46, 45, 53, 56, 53, 57, 67, 62, 64, 67, 52, 58, 62 Part A: Complete the frequency table below. You do not need to use all rows in the table. Points scored Frequency Part B: Construct a histogram of the data. www.AlgebraNation.com 10 Section 11 – Video 2 Box Plots Box Plot -­‐‑ uses ______________ to display the five-­‐‑number summary. Ø The five-­‐‑number summary of a data set consists of the minimum, lower quartile, median, upper quartile, and maximum values. What is a quartile? Consider the following data set with an even number of data: 6, 2, 1, 4, 7, 3, 8, 5 The minimum value of the data set is ________. The maximum value of the data set is _______. The median is the number in the middle when the data is ordered from least to greatest. The median of the data set is _______. The lower quartile of the data set is __________. The upper quartile of the data set is ___________. www.AlgebraNation.com 11 Use the five-­‐‑number summary to represent the data with a box plot. Some observations from our boxplot: Ø The lowest 50% of data values are from ______ to ______. Ø The highest 50% of data values are from _____ to ______. Ø The middle 50% (the box area) represents the values from ________ through ________. o The middle 50% is also known as the IQR (interquartile range). Ø The lower quartile represents the lower 25% of the data (_____ percentile). Ø The upper quartile represents the first 75% of the data (______ percentile). Ø 75% of the values are above __________________________. Ø 25% of the values are above __________________________. Ø The median of the lower half of the data is _______________. Ø The median of the upper half of the data is _______________. www.AlgebraNation.com 12 Suppose the last number in our data set was changed to 20. The new data set is: 1, 2, 3, 4, 5, 6, 7, 20 What observations can you make about our new five-­‐‑number summary compared to the five-­‐‑number summary for the previous data set? Minimum: Maximum: Median: Lower Quartile: Upper Quartile: Construct the new box plot on the previous problem’s graph. Why isn’t the new box plot symmetric at the median? www.AlgebraNation.com 13 Consider the following data set with an odd number of data values: 3, 7, 10, 11, 15, 18, 21 The minimum value of the data set is ________. The maximum value of the data set is _______. The median of the data set is _______. The lower quartile of the data set is __________. The upper quartile of the data set is __________. Use the five-­‐‑number summary to construct a box plot. www.AlgebraNation.com 14 Try It! The time, rounded to the nearest hour, that 26 students spent on their smartphones on a given day was recorded as follows: 0, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 12 Construct a box plot to represent the data. Label the minimum, maximum, lower quartile, upper quartile, and median. The bottom 25% of students spent at most _________ hours on their smartphone. www.AlgebraNation.com 15 BEAT THE TEST! 1. Mrs. Bridgewater recorded the number of text messages 10 different students sent in one day and constructed the box plot below for the data. Part A: Use the following vocabulary to label the box plot. You will not use all of the words on the list. A. Average B. Lower Quartile C. Maximum D. Mean E. Median F. Minimum G. Upper Quartile Text Messages Sent In One Day Part B: The 50th percentile of the data set is _________. 2 and 20. Part C: Half of the data values are between 8 and 12. 8 and 14. 10 and 12. Part D: 75% of students send fewer than per day. www.AlgebraNation.com 12 13 14 15 text messages 16 Section 11 – Video 3 Measures of Center and Shapes of Distributions Data displays can be used to describe the following elements of a data set’s distribution: Ø Center Ø Shape Ø Spread There are 3 common measures of center. Ø Mean – the average of the data values Ø Median – the middle value of the ordered data set Ø Mode – the most frequently occurring value(s) www.AlgebraNation.com 17 Mr. Gray gave a 10-­‐‑question algebra test on a regular school day with no special activities. The scores were as follows: 60, 60, 70, 70, 70, 80, 80, 80, 80, 90, 90, 90, 100, 100 The dot plot for the data is as follows: Looking at the dot plot, what do you think is the value of the median? What is the value of the mean? Why is it important to know where the center is? The shape of a dot plot also gives important information about a data set’s shape, or distribution. The data in the previous dot plot follows a normal distribution. What do you notice about the shape of a normal distribution? www.AlgebraNation.com 18 Mr. Gray then gave a 10-­‐‑question algebra test the day after a basketball game against the school’s rival. The scores were as follows: 60, 60, 60, 70, 70, 70, 70, 70, 80, 80, 80, 90, 90, 100 The mean of this data set is ______ and the median is ______. Which measure is a more appropriate measure of center, the mean or the median? Does this data set have a normal distribution? Why or why not? The shape of this distribution is ______________________________. www.AlgebraNation.com 19 Mr. Gray then gave a 10-­‐‑question algebra test the day after a mid-­‐‑
week early release day. The scores were as follows: 60, 70, 70, 80, 80, 80, 90, 90, 90, 90, 90, 100, 100, 100 Which value do you think will be smaller, the mean or the median? The dot plot for the data is as follows: Which measure is a more appropriate measure of center, the mean or the median? The shape of this distribution is ______________________________. Complete the table below: Data Shape Best Measure of Center Normal Skewed www.AlgebraNation.com 20 Try It! At three different locations, 15 people were surveyed about the average number of text messages they send each day. Case 1: The survey took place in a high school. The data is as follows: 5, 10, 10, 15, 15, 15, 20, 20, 20, 20, 20, 20, 25, 25, 25 Find the mean and the median of the data set. The data is ™ normally distributed. ™ skewed right. ™ skewed left. Which is the more appropriate measure of center to use for this data set – the mean or the median? www.AlgebraNation.com 21 Case 2: The same survey was administered outside a shopping center. The data is as follows: 5, 5, 10, 10, 10, 15, 15, 15, 15, 15, 20, 20, 20, 25, 25 Find the mean and the median of the data set. What is the shape of the data? The best measure of center to describe this data set is the www.AlgebraNation.com ™ mean. ™ median. 22 Case 3: The survey took place at a retirement center. The data is as follows: 5, 5, 5, 5, 5, 5, 5, 10, 10, 10, 10, 10, 15, 15, 15 Find the mean and the median of the data set. The data is ™ normally distributed. ™ skewed right. ™ skewed left. Which is the more appropriate measure of center to use for this data set – the mean or the median? www.AlgebraNation.com 23 BEAT THE TEST! 1. Mr. Logan surveyed his junior and senior students about the time they spent studying math in one day, rounded to the nearest five minutes. Then, he tabulated the results and created a dot plot displaying the data for both groups. Part A: The value of the larger median for the two groups is ___________. Part B: The value of the larger mean for the two groups is ___________. Part C: Using one to two sentences, describe the difference between the number of hours juniors and seniors studied by comparing the center and shapes for the groups. www.AlgebraNation.com 24 2. Which of the following would you predict to be normally distributed? Check all that apply. o A dot plot with a peak at the center. o Data with a median far greater than the mean. o Data with the same mean and median. o A dot plot with most data values to the right of the peak. o The heights of women in the United States. www.AlgebraNation.com 25 Section 11 – Video 4 Measuring Spread A meteorologist recorded the average weekly weather temperatures over a 13-­‐‑week period. A meteorologist in a different state also recorded the average weekly weather temperatures over a 13-­‐‑week period. What are the differences between the spreads of the two data sets? www.AlgebraNation.com 26 There are two primary ways to measure the spread of data: Ø Interquartile Range (IQR) = ______________________________ o The IQR is typically used to describe the spread of skewed data. Consider the following data set: 5, 5, 6, 7, 8, 8, 8, 9, 10, 12, 12 What are the upper and lower quartiles of the data? Calculate the interquartile range of the data. Why do you think IQR is used to measure spread in skewed data? www.AlgebraNation.com 27 Ø Standard Deviation is the typical distance of the data values from the mean. The larger the standard deviation, the ________________ the individual values are from the mean. o Typically used for _______________________________ Consider the dot plots below. A. B. Which would have a larger standard deviation? Explain your answer. www.AlgebraNation.com 28 Assume that we have a data set so large that we aren’t given a list of all the values. We are told the data follows a normal distribution with a mean of 16 and standard deviation of 4. Label the distribution below with the values using the mean and standard deviation. Suppose one of the data observation values is 20. An observation of 20 is __________ standard deviation(s) ______________ the mean. Suppose one of the data observation values is 8. An observation of 8 is ____________ standard deviation(s) ______________ the mean. Suppose an observation is 1.5 standard deviations above the mean. The value of that observation is ___________. www.AlgebraNation.com 29 We can use the empirical rule to understand where the data lies. Empirical Rule: Ø Approximately 68% of values are within 1 standard deviation of the mean. Ø Approximately 95% of values are within 2 standard deviations of the mean. Ø Approximately 99.7% of values are within 3 standard deviations of the mean. www.AlgebraNation.com 30 Try It! Suppose the amount of water a machine dispenses into plastic bottles has a normal distribution with a mean of 16.2 ounces and a standard deviation of 0.1 ounces. Use the mean and standard deviation to label the distribution. The middle 95% of bottles contain between _______ and ________ ounces of water. Approximately 68% of bottles have between ________ and __________ ounces of water. What percent of bottles contain more than 16.4 ounces of water? What is the probability that a randomly selected bottle contains less than 16.3 ounces of water? What percent of bottles contain between 16.1 and 16.4 ounces of water? www.AlgebraNation.com 31 BEAT THE TEST! 1. SAT mathematics scores for a particular year are approximately normally distributed with a mean of 510 and a standard deviation of 80. Part A: What is the probability that a randomly selected score is greater than 590? Part B: What is the probability that a randomly selected score is greater than 670? Part C: What percent of students score between 350 and 670? Part D: A student who scores a 750 is in the ___________ percentile. www.AlgebraNation.com 32 2. The data from a survey of the ages of people in a CrossFit class was skewed to the right. Part A: The appropriate measure of center to describe the data distribution is the ™ mean. ™ median. The ™ interquartile range is the appropriate measure to ™ standard deviation describe the spread. Part B: The boxplot below represents the data. Calculate the appropriate measure of spread. www.AlgebraNation.com 33 Section 11 – Video 5 Outliers in Data Sets A survey about the average number of text messages sent per day was conducted at a retirement home: 5, 5, 5, 5, 5, 5, 5, 10, 10, 10, 10, 10, 15, 15, 15 The mean for this data set is 8.7 and the median is 10. Grandma Gadget, who is up-­‐‑to-­‐‑date on the latest technology and loves to text her 25 grandchildren, was added to the sample. She sends an average of 85 text messages per day. The new data set is: 5, 5, 5, 5, 5, 5, 5, 10, 10, 10, 10, 10, 15, 15, 15, 85 Which measure of center will be most affected by adding Grandma Gadget – the mean or the median? Justify your answer. Would the outlier have a greater effect on standard deviation or interquartile range? Justify your answer. Grandma Gadget’s data point is called an outlier. An outlier is an _______________value in a data set that is very distant from the others. www.AlgebraNation.com 34 Try It! The students in Mrs. Gomez’s class were surveyed about the number of text messages they send per day. The data is as follows: 0, 24, 26, 28, 28, 30, 33, 35, 35, 36, 38, 39, 42, 42, 45, 50 What value would you predict to be an outlier? How does the outlier affect the mean? How does the outlier affect the median? Which measure of center would best describe the data – the mean or the median? How does the outlier affect the standard deviation? How does the outlier affect the interquartile range? Which measure of spread would best describe the data – the standard deviation or the interquartile range? www.AlgebraNation.com 35 BEAT THE TEST! 1. After a long day at Disney World, a group of students were asked how many times they each rode Space Mountain. The values are as follows: 4, 3, 10, 1, 2, 2, 4, 3, 5, 3, 4, 5, 4, 5 Part A: Are there any outliers in the data set above? Explain. Part B: The outlier causes the ™ mean to be greater than the ™ median ™ mean. ™ median. Part C: If the outlier was changed to 5, the interquartile range would and the ™ increase ™ decrease ™ stay the same standard deviation would ™ increase. ™ decrease. ™ stay the same. www.AlgebraNation.com 36 2. The revenues of 15 companies are given below. Revenue (in billions) $476 Walmart $451 Royal Dutch Shell $425 China National Petroleum Corp. $420 Exxon Mobil Corp. $411 Sinopec Group $327 Samsung $311 Saudi Aramco $303 Mike’s Prime Rib $290 State Grid Corp. $254 Total Inc. $240 Volkswagen Group $222 Toyota $220 Chevron $214 Glencoe Xstrata $46 BP Company Part A: Which company is an outlier? Part B: What does being an outlier mean in this particular case study? www.AlgebraNation.com 37 Section 11 – Video 6 Comparing Distributions Major The following box plots represent the starting salaries (in thousands of dollars) of 12 recent business graduates, 12 recent engineering graduates, and 12 recent psychology graduates. Business Engineering Psychology Salary Describe the shape of each major’s data distribution. Business: Engineering: Psychology: Which distribution has the largest median? The largest IQR? There is at least one outlier in the ____________________ data set(s). www.AlgebraNation.com 38 BEAT THE TEST! 1. The Bozeman Bucks and Tate Aggies cross-­‐‑country teams ran an obstacle course. The times for each team are summarized below. Bozeman Obstacle Course Times 4:25 5:21 5:54 6:48 4:43 5:31 6:08 6:53 4:49 5: 32 6:20 7:16 5:02 5:37 6:26 7:23 5:12 5:52 6:33 8:05 Tate Obstacle Course Times Which statements are true about the data for the Bozeman Bucks and the Tate Aggies? Select all that apply. o The median time of the Bozeman Bucks is less than the median time of the Tate Aggies. o The median time of the Bozeman Bucks is greater than the median time of the Tate Aggies. o The interquartile range of the Bozeman Bucks is less than the interquartile range of the Tate Aggies. o The interquartile range for the Bozeman Bucks is equal to the interquartile range for the Tate Aggies. o The data for the Bozeman Bucks is skewed to the left. o The data for the Tate Aggies includes an outlier. www.AlgebraNation.com 39 2. The dot plot below compares the arrival times of 30 flights for each of two airlines. • Negative numbers represent the number of minutes the flight arrived before its scheduled time. • Positive numbers represent the number of minutes the flight arrived after its scheduled time. • Zero indicates the flight arrived at its scheduled time. Based on this data, from which airline will you choose to buy your ticket? Use your knowledge of center and spread to justify your choice. www.AlgebraNation.com 40