Download Statistical Reasoning Student Notes

CHAPTER 5: STATISTICAL REASONING 1. Exploring Data – pg. 238-240 Assignment: pg. 239-240#1-3 2. Frequency Tables, Histograms, and Frequency Polygons – pg. 241253 Assignment: pg. 249-253 #1-7, 11 3. Standard Deviation – pg. 254-265 Assignment: pg. 261-265 #1-9, 11-13 4. Mid-Unit Review – pg. 266-268 Assignment: pg. 267-268 #1-6 5. The Normal Distribution – pg. 269-282 Assignment: pg. 279-282 #1-5, 7-13 6. Z-scores – pg. 283-294 Assignment: pg. 292-294 #1-10, 12, 13, 16, 18, 21 7. Confidence Intervals – pg. 295-304 Assignment: pg. 302-304 #1-6, 9, 10 8. Worksheet Review 9. Chapter Quiz 10. Chapter Review – pg. 306-310 Assignment: pg. 308-310# 1-12 11. Chapter Exam LESSON 1: EXPLORING DATA Learning Outcome: Learn to explore the similarities and differences between two sets of data. Why comparing two or more sets of data, which comparing devices could you use? (draw on previous experiences with stats). For each statistical device you name, can you offer a definition for each? Mean: Median: Mode: Range: Outlier: With a partner complete Getting started on page 236-237 of your textbook. 2 Ex. Frank needs a new battery for his car. He is trying to decide between two different brands. Both brands are the same price. He obtains data for the lifespan, in years, of 30 batteries of each brand, as shown below. Measured Lifespans of 30 Car Batteries (years) Brand X Brand Y 5.1 7.3 6.9 4.7 5.0 6.2 6.4 5.5 5.7 6.1 6.8 6.0 4.8 4.1 5.2 8.1 6.3 7.5 5.0 4.6 5.7 8.2 3.3 3.1 4.3 5.9 6.6 5.8 6.4 5.7 5.4 6.3 4.8 5.9 5.5 4.7 6.0 4.5 6.6 6.0 5.0 6.5 5.8 5.4 5.1 5.7 6.8 5.6 4.9 6.1 4.9 5.7 6.2 7.0 5.8 6.8 5.9 5.3 5.6 5.9 How can you compare the data to help Frank decide which brand of battery to buy? Describe how the data in each set is distributed. Describe any similarities and differences between the two sets of data. Explain why the mean and median do not fully describe the difference between these two brands of batteries. Consider the range, which is one measure of dispersion for data. Explain what additional information can be learned from the range of the data. Is the mode useful to compare in this situation? Explain. 3 Suppose that one battery included in the set of data for brand Y is defective, and its lifespan is 0.5 years instead of 5.9 years. Discuss how this would or would not affect Frank’s decision. Dispersion: A measure that varies by the spread among the data in a set; dispersion has a value of zero if all the data in a set is identical, and it increases in value as the data becomes more spread out. Assignment: pg. 239-240 #1-3 4 LESSON 2: FREQUENCY TABLE, HISTOGRAMS AND FREQUENCY POLYGONS Learning Outcome: Learn to create frequency tables and graphs from a set of data. If you inherited a hockey card collection, how could you organize a catalogue of the cards to see how many of each player you have? Frequency and Frequency Tables The frequency of a particular data value is the number of times the data value occurs. For example, if four students have a score of 80 in mathematics, and then the score of 80 is said to have a frequency of 4. The frequency of a data value is often represented by f. A frequency table is constructed by arranging collected data values in ascending order of magnitude with their corresponding frequencies Ex. The marks awarded for an assignment set for a Year 8 class of 20 students were as follows: 6 7 5 7 7 8 7 6 9 7 4 10 6 8 8 9 5 6 4 8 Present this information in a frequency table. 5 Construct a table with three columns. The first column shows what is being arranged in ascending order (i.e. the marks). The lowest mark is 4. So, start from 4 in the first column as shown below. Go through the list of marks. The first mark in the list is 6, so put a tally mark against 6 in the second column. The second mark in the list is 7, so put a tally mark against 7 in the second column. The third mark in the list is 5, so put a tally mark against 5 in the third column. Count the number of tally marks for each mark and write it in third column With the given tally chart, we can create a histogram of the situation using our calculator: Histogram: Graph of a frequency distribution, in which equal intervals of values are marked on a horizontal axis and the frequencies associated with these intervals are indicated by the areas of the rectangles drawn for these intervals. Stat – 1: Edit: this opens up the table editor List the Mark in 𝐿1 and the frequency of the mark in 𝐿2 Use 2nd function STAT PLOT to access the graphing screen. Select 1: turn plot ON, select the histogram graph Xlist: L1 Freq: L2 6 In general: We use the following steps to construct a frequency table: Step 1: Construct a table with three columns. Then in the first column, write down all of the data values in ascending order of magnitude. Step 2: To complete the second column, go through the list of data values and place one tally mark at the appropriate place in the second column for every data value. When the fifth tally is reached for a mark, draw a horizontal line through the first four tally marks as shown for 7 in the above frequency table. We continue this process until all data values in the list are tallied. Step 3: Count the number of tally marks for each data value and write it in the third column Class Intervals (or Groups) When the set of data values are spread out, it is difficult to set up a frequency table for every data value as there will be too many rows in the table. So we group the data into class intervals (or groups) to help us organize, interpret and analyze the data. Ideally, we should have between five and ten rows in a frequency table. Bear this in mind when deciding the size of the class interval (or group). 7 Each group starts at a data value that is a multiple of that group. For example, if the size of the group is 5, then the groups should start at 5, 10, 15, 20 etc. Likewise, if the size of the group is 10, then the groups should start at 10, 20, 30, 40 etc. The frequency of a group (or class interval) is the number of data values that fall in the range specified by that group (or class interval). Ex. The number of calls from motorists per day for roadside service was recorded for the month of December 2003. The results were as follows: Set up a frequency table for this set of data values. To construct a frequency table, we proceed as follows: Choose any appropriate range for the values, keeping in mind you want between 5-10 ranges. Step 1: Construct a table with three columns, and then write the data groups or class intervals in the first column. The size of each group is 40. So, the groups will start at 0, 40, 80, 120, 160 and 200 to include all of the data. Note that in fact we need 6 groups (1 more than we first thought). 8 Step 2: Go through the list of data values. For the first data value in the list, 28, place a tally mark against the group 0-39 in the second column. For the second data value in the list, 122, place a tally mark against the group 120-159 in the second column. For the third data value in the list, 217, place a tally mark against the group 200-239 in the second column. Step 3: Count the number of tally marks for each group and write it in the third column. The finished frequency table is as follows: With our tally chart completed, create a histogram without the use of a calculator. Roadside Service Calls 9 Using a frequency Polygon: Using our previous information: We will need to add on one more component to our tally chart in order to graph the situation. We need to add a midpoint value of our class intervals: Class Interval 0-40 40-80 80-120 120-160 160-200 200-240 Midpoint 20 60 100 140 180 220 Frequency 1 5 12 8 4 1 We calculate our midpoint by adding the boundaries of each interval and dividing by 2. The chart now gives us values we can input into our calculator. List 1: Midpoint List 2: Frequency Recall how to input information into lists on our calculator. In our stat plot, choose the line instead of the histogram. We may need to change our window in order to see the graph: X: [0, 250, 50] y: [0, 15, 5] 10 Frequency Polygon: The graph of a frequency distribution, produced by joining the midpoints of the intervals using straight lines. Assignment: pg. 249-253 #1-7, 11 11 LESSON 3: STANDARD DEVIATION Learning Outcome: Learn to determine the standard deviation for sets of data, and use it to solve problems and make decisions. Celebrity Guessing Game Suppose that you are in some course and have just received your grade on an exam. It is natural to ask how the rest of the class did on the exam so that you can put your grade in some context. Knowing the mean or median tells you the "center" or "middle" of the grades, but it would also be helpful to know some measure of the spread or variation in the grades. Let’s look at a small example. Suppose three classes of 5 students each write the same exam and the grades are: Class 1 Class 2 Class 3 82 82 67 78 82 66 70 82 66 58 42 66 42 42 65 Find the mean for each class: What do we notice? Does the mean describe the differences in each set of data? What is the range in each class? Does this help with picking the more consistent class? 12 Each of these classes has a mean, , of 66 and yet there is great difference in the variation of the grades in each class. One measure of the variation is the range, which is the difference between the highest and lowest grades. In this example the range for the first two classes is 82 - 42 = 40 while the range for the third class is 67 - 65 = 2. The range is not a very good measure of variation here as classes 1 and 2 have the same range yet their variation seems to be quite different. One way to see this variation is to notice that in class 3 all the grades are very close to the mean, in class 1 some of the grades are close to the mean and some are far away and in class 2 all of the grades are a long way from the mean. It is this concept that leads to the definition of the standard deviation. Standard Deviation: A measure of the dispersion or scatter of data values in relation to the mean; a low standard deviation indicates that most data values are close to the mean, and a high standard deviation indicates that most data values are scattered farther from the mean. We can find standard deviation using our calculators. What is the standard deviation (𝜎), of each class in the chart above? On our calculators, enter each class information into our list on our calculator. Once entered, choose STAT: Calc: 1: 1-Var Stats, then enter the list you choose to be evaluated: Class 1: 𝜎 = 14.5 Class 2: 𝜎 = 19.6 Class 3: 𝜎 = 0.63 According to the standard deviations calculated, which class is the most consistent? Which class is the most inconsistent? 13 Ex. Brendan was wondering about the accuracy of the mass measurements given on two cartons that contained sunflower seeds. He decided to measure the masses of the 20 bags in the two cartons. One carton contained 227g bags, and the other contained 454g bags. Masses of 227g Bags (g) 228 220 233 227 230 227 221 229 224 235 224 231 226 232 218 218 229 232 236 223 Masses of 454g Bags (g) 458 445 457 458 452 457 445 452 463 455 451 460 455 453 456 459 451 455 456 450 How can measures of dispersion be used to determine if the accuracy of measurement is the same for both bag sizes? 227 g bags: 454 g bags: Range: Range: Mean = Mean = 𝜎= 𝜎= The accuracy of measurement is not the same for both sizes of bag. Ex. Angela conducted a survey to determine the number of hours per week that Grade 11 males in her school play video games. She determined that the mean was 12.84h, with a standard deviation of 2.16h. Janessa conducted a similar survey of Grade 11 females. She organized her results in this frequency table. Compare the results of the two surveys. 14 Gaming Hours per Week for Grade 11 Females Hours Frequency 3-5 7 5-7 11 7-9 16 9-11 19 11-13 12 13-15 5 How would we have to modify the table to enter the results into our calculator? Find a midpoint of the hours. Gaming Hours per Week for Grade 11 Females Hours Midpoint Frequency 3-5 4 7 5-7 6 11 7-9 8 16 9-11 10 19 11-13 12 12 13-15 14 5 Enter values into calculator: Gaming hours per week for Grade 11 females: 𝑥̅ = 𝜎= Gaming hours per week for Grade 11 males: 𝑥̅ = 𝜎= Comparing the two sets of data: The standard deviation for the females is higher than the standard deviation for the males. Therefore, the female’s times vary more from their mean of about 9h. The standard deviation for the males is lower. Therefore, their data is more consistent, even though their mean is higher. 15 Assignment: pg. 261-265 #1-9, 11-13 Mid-Unit Review – pg. 266-268 Assignment: pg. 267-268 #1-6 16 LESSON 4: THE NORMAL DISTRIBUTION Learning Outcome: Learn to determine the properties of a normal distribution, and compare normally distributed data. Many games require dice. For example, the game of Yahtzee requires five dice. What shape is the data distribution for the sum of the numbers rolled with dice, using various numbers of dice? If we use two dice, find the amount of ways we can add the two die to the sum given in the chart below: Sum 2 3 4 5 6 7 8 9 10 11 12 Ways to find sum Create a histogram using the above information. 17 Use the information to calculate the mean of the frequency table. Mean = Which phrase best describes the distribution of the data? A: There are more data above the mean than below it B: There are more data below the mean than above it C: The data are symmetrically distributed about the mean With a partner, roll two dice 50 times. Record the sum for each roll in a frequency distribution table. Then draw a graph to represent the distribution of the data. Comment on the distribution of the data. Sum 2 3 4 5 6 7 8 9 10 11 12 Ways to find sum 18 What would happen to the graph if we combined the data from the entire class? How would the graph look? Make a conjecture about what the graph would look like if you rolled the two dice 50 000 times. Normal Curve: a symmetrical curve that represents the normal distribution; also called the bell curve. Normal Distribution: Data that, when graphed as a histogram or a frequency polygon, results in a unimodal symmetric distribution about the mean. The Standard Normal Distribution Curve In order to compare normal curves and to solve probability problems involving normal distributions, we convert the normal distribution curve given in a problem into the standard normal distribution curve. The diagram shows the approximate area under the standard normal distribution curve sub-divided into regions of width equal to one standard deviation. The percentage of the area under the curve in each region is indicated. 19      50% of the data is above the mean 68.26% of the data is within one standard deviation of the mean 95.44% of the data is within two standard deviations of the mean 99.74% of the data is within three standard deviations of the mean Total area under the curve is 1 or 100% Ex. A nurse records the number of hours an infant sleeps during a day. He then records the data on a normal distribution curve shown below. The values shown on the horizontal axis differ by one standard deviation. 34.13% 0.13% 34.13% 13.59% 13.59% 2.15% A B 0.13% 2.15% 10 12 14 C Number of Hours Slept a.What is the mean of the data? : b.What is the standard deviation?: 20 D c. What are the values for A, B, C, and D? d. What percentage of a day to the nearest hundredth, does the infant Sleep between 8 and 16 hours? Ex. Shirley wants to buy a new cellphone. She researches the cellphone she is considering and finds the following data on its longevity, in years. 2.0 2.4 3.3 1.7 2.5 3.7 2.0 2.3 2.9 2.2 2.3 2.7 2.5 2.7 1.9 2.4 2.6 2.7 2.8 2.5 1.7 1.1 3.1 3.2 3.1 2.9 2.9 3.0 2.1 2.6 2.6 2.2 2.7 1.8 2.4 2.5 2.4 2.3 2.5 2.6 3.2 2.1 3.4 2.2 2.7 1.9 2.9 2.6 2.7 2.8 a. Does the data approximate a normal distribution. (plot data on graphing calculator and find the mean and standard deviation) Histogram looks symmetrical about the mean. Mean = , standard deviation: , median = Using the information from our data, draw a normal distribution. Draw curve for students, label mean, and values of each increment. u + sd = 3… 21 b. If Shirley purchases this cellphone, what is the likelihood that it will last for more than three years? Draw the histogram, label the mean and standard deviations. 𝜇 + 1𝜎 = 2.526 + 0.482 = 0.3008 One standard deviation above the mean is 50% + 34% = 84% For the cell phones that lasted more than three years: 100% - 84% = 16%. Normal curves can vary in two main ways: the mean determines the location of the centre of the curve on the horizontal axis, and the standard deviation determines the width and height of the curve. Assignment: pg. 279-281 #1-5, 7-13 22 LESSON 5: Z-SCORES Learning Outcome: Learn to use z-scores to compare data, make predictions, and solve problems. What are z-scores? A common statistical way of standardizing data on one scale so a comparison can take place is using a z-score. The z-score is like a common yard stick for all types of data. Each z-score corresponds to a point in a normal distribution and as such is sometimes called a normal deviate since a z-score will describe how much a point deviates from a mean or specification point. A z-score is a standardized value that indicates the number of standard deviations of a data value above or below the mean. Formula for calculating z-scores: z  x  z=the z-score, x = the particular data value,  = mean,  = standard deviation Given a singular data point, how many Standard Deviations is it from the mean? How do we find how many standard deviations from the mean the line is? Ex. Tony’s midterm marks are shown below, together with the class mean and the standard deviation for each subject. By calculating z-scores, determine in which subject Tony performed best relative to the class. Subject Tony’s Mean Standard Mark Mark Deviation Math 74 68 12 Chemistry 79 73 14 Physics 68 66 11 23 Z-Score Tables The z-score table gives the area to the left of a particular z value. This area to the left of z is denoted by A(z) Properties of z-scores  A z-score for a data value describes the number of standard deviations above or below the mean  A negative z-score indicates that the data value is below the mean and is shown to the left of the mean on the standard normal curve.  A positive z-score indicates that the data value is above the mean and is shown to the right of the mean on the standard normal curve.  The z-score table gives:  Area under the curve, to the left of the z-score or  Percentage of data to the left of the z-score or  Probability that a randomly chosen data value is to the left of the zscore.  The mean, median, and mode have a z-score of zero. Ex. Use z-score table to calculate: a. A(-3) Locate –3 on the left hand side, then locate 0.00 on the top and match up the answers to get 0.0013 b. A(1) A(1) = 24 Ex. IQ tests are sometimes used to measure a person’s intellectual capacity at a particular time. IQ scores are normally distributed, with a mean of 100 and a standard deviation of 15. If a person scores 119 on an IQ test, how does this score compare with the scores of the general population? Sketch the situation as a normally distributed histogram. This means that an IQ score of 119 is greater than 89.90% of IQ scores in the general population. Solving the same question using technology: Normalcdf( The command “normalcdf” can be used to calculate normal distribution probability between two data values or to the left or right of a data value for a specified mean and standard deviation. Accessing normalcdf: 1. Access the distribution menu DISTR by pressing 2nd then VARS 2. Select “normalcdf”, second selection, press enter 3. Use the following to determine the area between two data values: Normalcdf(lower bound, upper bound, mean, standard deviation) 4. Press enter to determine the area To calculate to the left of a data value, replace the lower bound with 11099 or 0 To calculate to the right of a data value, replace the upper bound with 1 1099 The answers obtained will not be exactly the same as those obtained from tables due to the increased accuracy provided by the calculator 25 Normalcdf(0, 119, 100, 15) = 0.8973 Ex. Running shoes lose their shock-absorption after a mean distance of 640 km, with a standard deviation of 160 km. Zack is an elite runner and wants to replace his shoes after 25% of their natural life. At what distance should he replace his shoes? Sketch the situation: 25% We need to find out the z-score that represents 25% or 0.25 Solving the previous question using technology: 26 InvNorm Given the area to the left of a data value, the command “invNorm(“ can be used to calculate the data value. The mean and standard deviation must be given. Steps: 1. Access the distribution menu DISTR by pressing 2nd then VARS 2. Select “invNorm(“, the 3rd choice and then press enter 3. Enter data values: InvNorm(area, mean, standard deviation) 4. Press enter Using InvNorm to solve our previous question: InvNorm(0.25, 640, 160) = 532.08 Assignment: pg. 292-294 #1-10, 12, 13, 16, 18, 21 27 LESSON 6: CONFIDENCE INTERVALS Learning Outcome: Learn to use the normal distribution to solve problems that involve confidence intervals. If a light bulb company wants to test the number of hours that a bulb will burn before failing, is it logical to test every bulb? If not, propose a method that the company could use to determine the longevity of its light bulbs. Ex. A telephone survey of 600 randomly selected people was conducted in an urban area. The survey determined that 76% of people, from 18 to 34 years of age, have a social networking account. The results are accurate within plus or minus 4 percent points, 19 time out of 20. How can this result be interpreted, it the total population of 18 to 34 year olds is 92 500? Calculate the range of people that have a social networking account, and determine the certainty of the results. The margin of error is ±4, so the confidence interval is 76% ±4% 28 The range of results are 72% - 80% The confidence level of the survey is 95% (19 out of 20) Confidence interval for population: 92 500 x 0.76 = 70 300 92500 x 0.04 = 3700 Population interval is 66 600 to 74 000. Ex. To meet regulation standards, baseballs must have a mass from 142.0g to149.0 g. A manufacturing company has set its production equipment to create baseballs that have a mean mass of 145.0 g. To ensure that the production equipment continues to operate as expected, the quality control engineer takes a random sample of baseballs each day and measures their mass to determine their mean mass. If the mean mass of the random sample is 144.7 g to 145.3 g, then the production equipment is running correctly. If the mean mass of the sample is outside the acceptable level, the production equipment is shut down and adjusted. The quality control engineer refers to the chart shown when conducting random sampling. Confidence Level 99% 95% 90% Sample Size Needed 110 65 45 a. What is the confidence interval and margin of error the engineer is using for quality control tests? 29 b. Interpret the table  Confidence level 99%: needs to measure 110 baseball to be confident 99 out of 100 times.  Confidence level 95%: needs to measure 65 baseball to be confident 95 out of 100 times.  Confidence level 90%: needs to measure 45 baseball to be confident 90 out of 100 times. c. What is the relationship between confidence level and sample size? For a constant margin of error, as the confidence level increases, the size of the sample needed to attain that confidence level increases. To have greater confidence that the baseballs meet quality standards, the engineer must use a larger sample. Ex. A poll was conducted to ask voters the following question: If an election were held today, whom would you vote for? The results indicated that 53% would vote for Smith and 47% would vote for Jones. The results were stated as being accurate within 3.8 percent points, 19 times out of 20. Who will win the election? For Smith: 53% ±3.8%: Confidence interval: For Jones: 47% ±3.8%: Confidence interval: The two confidence intervals overlap from If the poll is accurate, Smith is more likely to______. However, there is a chance that Jones will win, since the confidence intervals overlap by _____of the votes. 30 Need to know:  A confidence interval is expressed as the survey or poll result, plus or minus the margin of error.  The margin of error increases as the confidence level increases (with a constant sample size). The sample size that is needed also increases as the confidence level increases (with a constant margin of error).  The sample size affects the margin of error. A larger sample results in a smaller margin of error. A larger sample results in a smaller margin of error, assuming that the same confidence level is required. For Example:  A sample of 1000 is considered to be accurate to within ±3.1%, 19 times out of 20  A sample of 2000 is considered to be accurate to within ±2.2%, 19 times out of 20  A sample of 3000 is considered to be accurate to within ±1.8%, 19 times out of 20. Assignment: pg. 302-304 #1-6, 9, 10 Chapter Quiz Chapter Review – pg. 306-310 Assignment: pg. 308-310# 1-12 Chapter Exam 31 32

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistical Reasoning Student Notes