Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MAT 110 WORKSHOP Created by Michael Brown, Haden McDonald & Myra Bentley for use by the Center for Academic Support UNIT 3: STATISTICS Introduction Definitions Mean: The average in a set of data. Median: The middle number in an ordered list. If there are two middles, the median is the average of those two. Mode: The number(s) that appears the most frequently in a data set. Range: The difference between the largest and smallest values. Standard Deviation: An average measure of how far each data point is from the mean. Normal Distribution: A very common distribution that describes many real life values. The symmetric Bell curve. Z-Score: The number of standard deviations a value is from the mean. Confidence Interval: A range that is 'likely' to contain the actual mean of a data set. Usually associated with a margin of error. Margin of error: The likelihood that a confidence interval does NOT contain the mean of a data set. Example • Calculate the mode, mean, and median of the following data: • 12, 11, 6, 24, 11, 9, 15, 11 Example 6,9,11,11,11,12,15,24 • Mode = 11 because it appears 3 times while all the other numbers only appear once. • Mean = 99/8 because 6+9+11+11+11+12+15+24=99 and there are 8 numbers • Median = 11 because 11 is in the middle of the numbers when placed smallest to largest Frequency Tables A frequency table is a table that shows the total for each category or group of data. •Example: 25 viewers evaluated the latest episode of CSI. The possible evaluations are: (E)xcellent, (A)bove average, a(V)erage, (B)elow average, (P)oor. After the show, the 25 evaluations were as follows: A, V, V, B, P, E, A, E, V, V, A, E, P, B, V, V, A, A, A, E, B, V, A, B, V Construct a frequency table and a relative frequency table for this list of evaluations. Frequency Tables After the show, the 25 evaluations were as follows: E, E, E, E, A, A, A, A, A, A, A, V, V, V, V, V, V, V, V, B, B, B, B, P, P Representing Data Visually The bar graph for the relative frequency is shown below. • Solution: Frequency Tables •Example: Suppose 40 health care workers take an AIDS awareness test and earn the following scores: 79, 62, 87, 84, 53, 76, 67, 73, 82, 68, 82, 79, 61, 51, 66, 77, 78, 66, 86, 70, 76, 64, 87, 82, 61, 59, 77, 88, 80, 58, 56, 64, 83, 71, 74, 79, 67, 79, 84, 68 Construct a frequency table and a relative frequency table for these data. Frequency Tables 79, 62, 87, 84, 53, 76, 67, 73, 82, 68, 82, 79, 61, 51, 66, 77, 78, 66, 86, 70, 76, 64, 87, 82, 61, 59, 77, 88, 80, 58, 56, 64, 83, 71, 74, 79, 67, 79, 84, 68 Representing Data Visually A variable quantity that cannot take on arbitrary values is called discrete. Other quantities, called continuous variables, can take on arbitrary values. The number of children in a family is an example of a discrete variable. Weight is an example of a continuous variable. We use a special type of bar graph called a histogram to graph a frequency distribution when we are dealing with a continuous variable quantity or a variable quantity that is discrete, but has a very large number of different possible values. Representing Data Visually A clinic has the following data regarding the weight lost by its clients over the past 6 months. Draw a histogram for the relative frequency distribution for these data. • Example: (continued on next slide) Representing Data Visually We first find the relative frequency distribution. • Solution: (continued on next slide) Representing Data Visually Draw the histogram exactly like a bar graph except that we do not allow spaces between the bars. Stem and Leaf Display The following are the number of home runs hit by the home run champions in the National League for the years 1975 to 1989 and for 1993 to 2007. 1975–1989: 38, 38, 52, 40, 48, 48, 31, 37, 40, 36, 37, 37, 49, 39, 47 1993–2007: 46, 43, 40, 47, 49, 70, 65, 50, 73, 49, 47, 48, 51, 58, 50 Compare these home run records using a stemand-leaf display. • Example: (continued on next slide) Stem and Leaf Display In constructing a stem-and- leaf display, we view each number as having two parts. The left digit is considered the stem and the right digit the leaf. For example, 38 has a stem of 3 and a leaf of 8. • Solution: 1975 to 1989 1993 to 2007 (continued on next slide) Stem and Leaf Display We can compare these data by placing these two displays side by side as shown below. Some call this display a back-to-back stem-and-leaf display. It is clear that the home run champions hit significantly more home runs from 1993 to 2007 than from 1975 to 1989. The Mean and the Median We use the Greek letter Σ (capital sigma) to indicate a sum. For example, we will write the sum of the data values 7, 2, 9, 4, and 10 by Σx = 7 + 2 + 9 + 4 + 10. We represent the mean of a sample of a population by x (read as “x bar”), and we will use the Greek letter μ (lowercase mu) to represent the mean of the whole population. The Mean and the Median A car company has been studying its safety record at a factory and found that the number of accidents over the past 5 years was 25, 23, 27, 22, and 26. Find the mean annual number of accidents for this 5-year period. • Example: • Solution: We add the number of accidents and divide by 5. The Mean and the Median The water temperature at a point downstream from a plant for the last 30 days is summarized in the table. What is the mean temperature for this distribution? • Example: (continued on next slide) The Mean and the Median A third column is added to the table that contains the products of the raw scores and their frequencies. • Solution: The mean is The Mean and the Median The Mean and the Median Listed are the yearly earnings of some celebrities. • Example: a) What is the mean of the earnings of the celebrities on this list? b) Is this mean an accurate measure of the “average” earnings for these celebrities? (continued on next slide) The Mean and the Median • Solution (a): Summing the salaries and dividing by 10 gives us Solution (b): Eight of the celebrities have earnings below the mean, whereas only two have earnings above the mean. The mean in this example does not give an accurate sense of what is “average” in this set of data because it was unduly influenced by higher earnings. The Mean and the Median The Mean and the Median The table lists the ages at inauguration of the presidents who assumed office between 1901 and 1993. Find the median age for this distribution. • Example: We first arrange the ages in order to get • Solution: There are 17 ages. The middle age is the ninth, which is 55. The Mean and the Median : Fifty 32-ounce quarts of a particular brand of milk were purchased and the actual volume determined. The results of this survey are reported in the table. What is the median for this distribution? • Example The Mean and the Median : Because the 50 scores are in increasing order, the two middle scores are in positions 25 and 26. We see that 29 ounces is in position 25 and 30 ounces is in position 26. The median for this distribution is • Solution Five Number Summary The Five Number Summary Consider the list of ages of the presidents from a previous example: 42, 43, 46, 51, 51, 51, 52, 54, 55, 55, 56, 56, 60, 61, 61, 64, 69. Find the following for this data set: a) the lower and upper halves b) the first and third quartiles c) the five-number summary • Example: (continued on next slide) The Five Number Summary • Solution: Finding the median, we can identify the lower and upper halves. (a): (b): The median of the lower half is The median of the upper half is (continued on next slide) The Five Number Summary (c): The five number summary is We represent the five-number summary by a graph called a box-and-whisker plot. (continued on next slide) Example • Find the five-number summary for the following 10 values: • 40, 37, 32, 28, 27, 24, 22, 34, 19, 36 • Find the minimum: • Find Q1: • Find the median: • Find Q3: • Find the maximum: Example • 19,22,24,27,28,32,34,36,37,40 • minimum: 19 because that is the smallest number • Q1: 24 because it is in the middle of the minimum and median • median: 30 because it is in the very middle of the numbers. ((28+32)/2=30) • Q3: 36 because it is in the middle of the median and the maximum • maximum: 40 because it is the largest number The Five Number Summary The Five Number Summary Find the mode for each data set. a) 5, 5, 68, 69, 70 • Example: b) 3, 3, 3, 2, 1, 4, 4, 9, 9, 9 c) 98, 99, 100, 101, 102 d) 2, 3, 4, 2, 3, 4, 5 : a) The mode is 5. b) There are two modes: 3 and 9. • Solution In c) and d) there is no mode. Comparing Measures of Central Tendency Assume that you are negotiating the contract for your union. You have gathered annual wage data and found that three workers earn $30,000, five workers earn $32,000, three workers earn $44,000, and one worker earns $50,000. In your negotiations, which measure of central tendency should you emphasize? • Example: Comparing Measures of Central Tendency • Solution: Mode: $32,000 Median: $32,000 Mean: The mean is $36,000. To make the salaries appear as low as possible, you would want to use the mode and median. The Range of a Data Set Standard Deviation Standard Deviation Standard Deviation Standard Deviation • Example: A company has hired six interns. After 4 months, their work records show the following number of work days missed for each worker: 0, 2, 1, 4, 2, 3 Find the standard deviation of this data set. • Solution: Mean: (continued on next slide) Standard Deviation We calculate the squares of the deviations of the data values from the mean. Standard Deviation: Standard Deviation Standard Deviation • Example: The following are the closing prices for a stock for the past 20 trading sessions: 37, 39, 39, 40, 40, 38, 38, 39, 40, 41, 41, 39, 41, 42, 42, 44, 39, 40, 40, 41 What is the standard deviation for this data set? • Solution: Mean: (sum of the closing prices is 800) (continued on next slide) Standard Deviation We create a table with values that will facilitate computing the standard deviation. Standard Deviation: Standard Deviation Comparing Standard Deviations All three distributions have a mean and median of 5; however, as the spread of the distribution increases, so does the standard deviation. The Normal Distribution The normal distribution describes many real-life data sets. The histogram shown gives an idea of the shape of a normal distribution. The Normal Distribution The Normal Distribution We represent the mean by μ and the standard deviation by σ. The Normal Distribution Suppose that the distribution of scores of 1,000 students who take a standardized intelligence test is a normal distribution. If the distribution’s mean is 450 and its standard deviation is 25, • Example: a) how many scores do we expect to fall between 425 and 475? b) how many scores do we expect to fall above 500? (continued on next slide) The Normal Distribution 425 and 475 are each 1 standard deviation from the mean. Approximately 68% of the scores lie within 1 standard deviation of the mean. • Solution (a): We expect about 0.68 × 1,000 = 680 scores are in the range 425 to 475. (continued on next slide) The Normal Distribution Solution (b): We know 5% of the scores lie more than 2 standard deviations above or below the mean, so we expect to have 0.05 ÷ 2 = 0.025 of the scores to be above 500. Multiplying by 1,000, we can expect that 0.025 * 1,000 = 25 scores to be above 500. Quartile Problem • The scores of students on an exam are normally distributed with a mean of 516 and a standard deviation of 36. • A) What is the first quartile score for this exam? • B) What is the third quartile score for this exam? Quartile Problem • The Quartiles have 25% of the data on either side so we can use the area to find the Z-Score which is +- 0.67. • A) x= -0.67*36+516. so the first quartile is at 491.88 • B) x = 0.67*36+516. so the third quartile is at 540.12 z-Scores The standard normal distribution has a mean of 0 and a standard deviation of 1. There are tables (see next slide) that give the area under this curve between the mean and a number called a z-score. A z-score represents the number of standard deviations a data value is from the mean. For example, for a normal distribution with mean 450 and standard deviation 25, the value 500 is 2 standard deviations above the mean; that is, the value 500 corresponds to a z-score of 2. z-Scores Below is a portion of a table that gives the area under the standard normal curve between the mean and a z-score. z-Scores Use a table to find the percentage of the data (area under the curve) that lie in the following regions for a standard normal distribution: • Example: a) between z = 0 and z = 1.3 b) between z = 1.5 and z = 2.1 c) between z = 0 and z = –1.83 (continued on next slide) z-Scores The area under the curve between z = 0 and z = 1.3 is shown. Using a table we find this area for the z-score 1.30. We find that A is 0.403 when z = 1.30. We expect 40.3%, of the data to fall between 0 and 1.3 standard deviations above the mean. • Solution (a): (continued on next slide) z-Scores The area under the curve between z = 1.5 and z = 2.1 is shown. We first find the area from z = 0 to z = 2.1 and then subtract the area from z = 0 to z = 1.5. Using a table we get A = 0.482 when = 2.1, and A = 0.433 when z = 1.5. The area is 0.482 – 0.433 = 0.049 or 4.9% • Solution (b): z (continued on next slide) z-Scores Due to the symmetry of the normal distribution, the area between z = 0 and z = –1.83 is the same as the area between z = 0 and z = 1.83. Using a table, we see that A = 0.466 when z = 1.83. Therefore, 46.6% of the data values lie between 0 and –1.83. • Solution (c): Converting Raw Scores to z-Scores Converting Raw Scores to z-Scores Suppose the mean of a normal distribution is 20 and its standard deviation is 3. • Example: a) Find the z-score corresponding to the raw score 25. b) Find the z-score corresponding to the raw score 16. (continued on next slide) Converting Raw Scores to z-Scores • Solution (a): We have We compute (continued on next slide) Converting Raw Scores to z-Scores • Solution (b): We have We compute Applications Suppose you take a standardized test. Assume that the distribution of scores is normal and you received a score of 72 on the test, which had a mean of 65 and a standard deviation of 4. What percentage of those who took this test had a score below yours? • Example: • Solution: We first find the z-score that corresponds to 72. (continued on next slide) Applications Using a table, we have that A = 0.460 when z = 1.75. The normal curve is symmetric, so another 50% of the scores fall below the mean. So, there are 50% + 46% = 96% of the scores below 72. (continued on next slide) Applications Consider the following information: 1911: Ty Cobb hit .420. Mean average was .266 with standard deviation .0371. 1941: Ted Williams hit .406. Mean average was .267 with standard deviation .0326. 1980: George Brett hit .390. Mean average was .261 with standard deviation .0317. Assuming normal distributions, use z-scores to determine which of the three batters was ranked the highest in relationship to his contemporaries. • Example: (continued on next slide) Applications • Solution: Ty Cobb’s average of .420 corresponded to a score of - z Ted Williams’s average of .406 corresponded to a z-score of George Brett’s average of .390 corresponded to a z-score of Compared with his contemporaries, Ted Williams ranks as the best hitter. Applications • Example: A manufacturer plans to offer a warranty on an electronic device. Quality control engineers found that the device has a mean time to failure of 3,000 hours with a standard deviation of 500 hours. Assume that the typical purchaser will use the device for 4 hours per day. If the manufacturer does not want more than 5% to be returned as defective within the warranty period, how long should the warranty period be to guarantee this? (continued on next slide) Applications We need to find a z-score such that at least 95% of the area is beyond this point. This score is to the left of the mean and is negative. By symmetry we find the z-score such that 95% of the area is below this score. • Solution: (continued on next slide) Applications 50% of the entire area lies below the mean, so our problem reduces to finding a z-score greater than 0 such that 45% of the area lies between the mean and that z-score. If A = 0.450, the corresponding z-score is 1.64. 95% of the area underneath the standard normal curve falls below z = 1.64. By symmetry, 95% of the values lie above –1.64. Since , we obtain (continued on next slide) Applications Solving the equation for x, we get Owners use the device about 4 hours per day, so we divide 2,180 by 4 to get 545 days. This is approximately 18 months if we use 31 days per month. The warranty should be for roughly 18 months. Right and Left Z-Score • Find the z-score such that: • A) The area under the standard normal curve to its left is 0.518 • B) The area under the standard normal curve to its left is 0.8167 • C) The area under the standard normal curve to its right is 0.2879 • D) The area under the standard normal curve to its right is 0.3573 Right and Left Z-Score • A) 0.518-.5= 0.018 look this up in the table to get 0.04 • B) 0.8167-.5= 0.3167 look this up in the table to get 0.91 • C) .5-0.2879= .2121 look this up in the table to get 0.56 • D) .5-0.3573= .1427 look this up in the table to get 0.36 Practice Problem • Length of skateboards in a skateshop are normally distributed with a mean of 30.9 in and a standard deviation of 1 in. The figure below shows the distribution of the length of skateboards in a skateshop. Calculate the shaded area under the curve. • Express your answer in decimal form with at least two decimal place accuracy. Practice Problem • There is an area of .475 on the right side of the curve but we must use the Z-Score formula to find the area on the left side. • z= 30.23−30.9 1 = -.67 • The area at Z-Score of -.67 is 0.2486 • So the total area is 0.2486+.475= .7236 Confidence Intervals A level C confidence interval is a range that is C% likely to contain the population mean of a set of data based on a sample mean (a 95% confidence interval based on sample data would be 95% likely to contain the population mean that the sample came from). The formula for the lower and upper bounds of a confidence interval is: Where the term on the left is the sample average, and the term on the right is referred to as the margin of error. Confidence Intervals •Example: Suppose that the distribution of scores of 100 students who take a standardized intelligence test is a normal distribution. If the distribution’s sample mean is 90 and its standard deviation is 10, what is a 95% confidence interval for the population mean? Here, the z-score is related to an area equal to half the confidence, i.e. z is related to .95/2 = .475. Locating this area in a z-score table will yield that z = 1.96. The left end of the interval is: 90 - 1.96 * 10 / sqrt(100) = 88.04 The right end of the interval is: 90 + 1.96 * 10 / sqrt(100) = 91.96 So the 95% confidence interval is (88.04, 91.96), OR we are 95% confident the population mean is between 88.04 and 91.96. Critical Values Z* • To find the critical Z* value you need to get the confidence interval in decimal form. • After than you divide the confidence interval by 2 to get the area. • Use your chart to find the area closest to your value and that is your critical Z* Critical Z Problem • Find the critical z* for a level 51 % confidence interval. Critical Z* Solution • 51/100 = .51 • .51/2 = .255 • Find the closest value to .255 (which is .2549). • Your Critical Z* value is 0.69 Representing Data Exercises Create a frequency and relative frequency table for the following set of numbers. 7 7 9 8 7 8 0 2 1 5 9 7 9 7 7 0 4 9 6 9 8 7 7 8 9 0 6 8 6 6 Calculate the mean, median, mode, and standard deviation of the data. Representing Data Solutions Create a frequency and relative frequency table for the following set of numbers. Calculate the mean, median, mode, and standard deviation of the data. Rang Frequency Relative Frequency e 70-74 2 (70, 74) 2/15 = .1333 75-79 6 (75, 76, 76, 78, 78, 79) 6/15 = .4 80-84 2 (80, 81) 2/15 = .1333 85-89 1 (86) 1/15 = .0667 90-94 2 (90, 92) 2/15 = .1333 95100 2 (96, 99) 2/15 = .1333 Mean = 82 Median = 79 Mode = 76, 78 Std. Dev. = 8.61 Normal Distribution Exercises Suppose 200 students took a test, and their scores were approximately normally distributed. The mean of the test scores was 82 and the standard deviation was 9. How many students got at least a 73? How many students got more than 95? What would a 95% confidence interval for this population be? Normal Distribution Solutions 200 students took a test. The mean was 82 and the std. dev. was 9. (a) How many students got at least a 73? (b) More than 95? (c) What would a 95% confidence interval for this population be? (a).84 or 84% (168 students) (b).075 or 7.5% (15 students) (c)(80.75, 83.25)