Download Unit 2: Data Analysis

5 Unit 2: Data Analysis Normal Distributions Objectives: B.1 To describe and illustrate normal and skewed distributions using real world examples. B.2 To calculate the standard deviation of a set of data using the formula for a population;   (x  x ) 2 n B.3 To use the Standard deviation to interpret data represented by as normal distribution. Measures of Central Tendency Statistics is the branch of mathematics concerned with manipulating groups of numerical facts so as to present significant information about the subject or source of the data. As members of the information age we are subjected to statistics on a daily basis. Graphs of stock prices, probabilities of developing cancer, even your average in math class are all examples of statistical analysis. Lists of numbers, such as test scores, are often represented on a graph such as a histogram or a frequency distribution. For example; a basket ball team with 12 players, their heights in centimeters are given in the tables below; Height(cm) Frequency 175 1 0.5 1 1.5 2 2.5 180 2 181 2 184 3 185 2 188 1 191 1 High School Basketball Team Frequency High School Basketball Team Frequency 2.5 2 1.5 1 0.5 2.5 2 1.5 1 0.5 175 180 181 184 185 188 191 Height (cm) 175 180 181 184 185 188 191 Height (cm) Each of the diagrams above gives the same information. The vertical axis on both graphs lists the number of times that each measurement appears in the table. Unit 2: Data Analysis Three measures of central tendency may be used; 1. The Mean is the average of all of the numbers. Here it is 183.2 cm. 2. The Median is the middle number in the list. Here it is 184 cm. 3. The Mode is the number that appears most often in the list. Here it is 184 cm. It is also useful to compare the spread of a set of data;   The range is the difference between the lowest number (here 175) and the highest number (here 192). The range of the basketball player’s heights is 192 - 175 = 17 The Deviation From the Mean is the difference between an individual data point and the mean. Graphing the distribution of data. In a normal distribution of data, the shape of the frequency distribution graph represents a bell shaped curve; Frequency Mean Median Mode Domain Notice that the mean, median and mode all occur at the center of the bell curve. In a Skewed distribution, the mean and mode are not at the center. The distribution on the right is Skewed to the Right or Positively Skewed. The mean is higher in value than the median. This distribution is characterized by extreme values to the right. For example, a graph of the heights of university athletes would be skewed to the right by the heights of the basketball players. Frequency The distribution to the right is Skewed to the Left or Negatively Skewed. The mean is lower in value than the median. This distribution is characterized by extreme values to the left. A graph of the the marks in math class might be skewed to the left if one or two students did not attend any classes. Frequency Mode Mean Median Domain Mode Mean Median Domain In our example of the high school basketball team, we can see that the data is slightly skewed to the left because the mean(183.2 cm) is less than the median value(184). The data is skewed this way because one student whose height is 175 cm is considerably shorter than the rest of the students. -23- Unit 2: Data Analysis Finding Mean, median and Mode. The mean can be found by adding all of the data points and dividing by the number of data points. This can be written in Sigma() notation as; n x i 1 n  xi n i 1 n Where x is the average or mean of all x, and xi is the ith data point of n points in total. x n  x means (x i 1 i 1 i 1 or x  + x2 + x3 +…+xn) or the sum of all values of x. The Median is found by listing the data and finding the middle term when arranged in order of size. If there is an even number of data points, the average of the two middle values is calculated. Example for the data; ( x1 , x2 , x3 , x4 , x5 ) , x3 would be the median. x x For the data; ( x1 , x2 , x3 , x4 , x5 , x6 ) , 3 4 would be the median. 2 The mode is easily found by selecting the value which appears most often in the data. If no value appears more often than the others, then there is no mode. Example: At an independent testing agency, the noise levels were measured from the operator’s seat of several different makes of self propelled swathers. The raw data is as follows, all measurements are in decibels; 92, 88, 84, 84, 90, 90, 87, 89, 87, 91, 95, 90, 87, 89, 90, 81 a. List this data in the form of a Histogram. b. Calculate the mean, median, and mode for this data. c. Is this a normal distribution or is the data skewed? Solution: b. Mean: a. Swather Noise Levels 1 n 1 1414 x  xi  (92  88  ...  90  81)   88.375 Frequency  n i 1 16 16 4 Median:(81,84,84,87,87,87,88,89,89,90,90,90,90,91,92,95) 89  89  89 2 3 2 1 80 82 84 86 88 90 92 94 96 98 Noise (dB) Mode: = 90 c. Since the mean is less than the median , this data is skewed to the left or negatively skewed. -24- Unit 2: Data Analysis TI-82 Calculator: To enter data into the lists press [STAT] 1 (choose edit). Enter each data point into the list and press [ENTER] to confirm. Data can also be entered using the curly brackets {} and [STO] make sure the terms are separated by commas. To clear a list from the edit screen press the up arrow until the list name (L1, L2…) is highlighted press [CLEAR] and [ENTER] To calculate mean or median for a list return to the main screen, press [2nd]-[STAT] [] (math menu) and choose 3:mean( or 4:median( from the list. Enter the list name ([2nd]-1 to 6) and press [ENTER] To sort a list press[2nd]-[STAT] and choose 1:sortA( Enter the list name([2nd]-1 to 6) and press [ENTER] Example: From our swather example press [STAT] 1 and then enter the data into L1. Press [ENTER] after each value. Alternate method: Enter {92,88,84,…,95}[STO] L1. Curly brackets are [2nd][(] and [2nd][)], and L1 is [2nd][1]. To find the mean press [2nd][STAT][] to get to the List Math menu. Choose 3:mean( and press [ENTER] then indicate which list by pressing [2nd][1] to get L1.The result should be 88.375. To find the median repeat the last sequence choosing 4:median( instead of 3:mean. The result should be 89. To find the mode one can sort the list by pressing [2nd][STAT][1] to choose SortA( and then indicate which list by pressing [2nd][1] to get L1. You can then examine the list by pressing [STAT][1] to see the list. 90 is the number with the most entries. Practice Questions 1: Measures of Central Tendency 1. In a Math 30B class, the marks were as follows; 64, 68, 73, 47, 58, 76, 73, 82, 66, 55, 62, 71, 59, 62, 79, 86, 73, 65, 96, 68, 75, 78, 61, 74 Find the range, mean, median and mode for this data. Is this a normal distribution or is it skewed? 2. The fifth hole in the Pleasant Valley Mini Golf course is an exceptionally tricky one. On one long weekend the attendant, a very bored mathematics student, collected the following scores from the players on hole number 5. Saturday: 2, 2, 3, 3, 3, 4, 4, 5, 6, 6, 6,6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 11, 11, 11, 11 Sunday: 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 10 Monday: 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7. a. Draw a frequency distribution graph for each of these days. b. Find the range, mean median and mode for each day. c. Are these normal distributions or skewed distributions. d. What inferences might be drawn from this data? -25- Unit 2: Data Analysis 3. An RCMP officer clocking speeds in a radar trap records the following speeds in km/h; 55, 45, 40, 50, 52, 56, 58, 48, 50, 30, 35, 62, 70, 52, 53, 36, 55, 52, 60 a. Find the range, mean median and mode for this data. b. Is this data in a normal distribution or skewed? Why? 4. The table on the right lists the percentage change in employment by industry in Saskatchewan from 1990 to 1996. This data was gathered by the Statistics Canada Labour Force Survey, 1997. Change In Employment by Industry, Saskatchewan, 1990 to 1996 Industry Percent Change Manufacturing Other Primary Industries Health and Social services Trade Transport, and Utilities Accommodation & Food Business Services Education Logging and Forestry Other Services Finance, Insurance Realty Construction a. Find the range, mean, median and mode for this data. b. Which measure of central tendency would you use to describe this data if you were, i) Premier of the province? Why? +4.3% +4.1% +3.9% +2.4% +1.9% +1.6% +1.6% +1.4% +1.3% +0.8% -0.2% -2.4% ii)Leader of the opposition? Why? Percentage of Water Content For Some Common Foods 5. The Percentage of water content for some common foods is given in the table to the right; Food Cucumber Tomato (raw) Celery Milk (skim) Orange Milk (whole) Apple Banana Egg (raw) Spaghetti (cooked) Cheese (cheddar) Bread (white) Bread (white toasted) Bacon (broiled, drained) Crackers (saltine) Lard a. State the range, mean, median and mode for this data b. Which measure of central tendency would be best to use in a nutritional information brochure? Why? Percent Water 95% 94% 94% 91% 89% 87% 84% 76% 74% 64% 37% 35% 24% 8% 4% 0% 6. The mean age of 25 students in a class is 17.2. When the 31 year old teacher enters the room what is the mean age of the people in the room? 7. A marathon runner traveling at a steady pace notices that the number of runners who pass her is the same as the number she has passed. Is her pace the mean, median or mode of the speed of the runners? -26- Unit 2: Data Analysis Variance and Standard Deviation In a collection of numbers it may be useful to obtain a measure of the Deviation from the mean score. The average deviation can be found by finding sum of the deviations of the values and dividing by the number of values. In summation (sigma) notation;  (x  x ) Average Deviation  n For our basketball team described on page 21, 175  183.2   180 183.2   180 183.2   ...  191 183.2  Average Deviation  12    8.2    3.2    3.2    2.2    2.2    0.8   0.8  1.8  1.8   4.8   7.8 12 0.4  0.03 12 For most sets, this number will be very close to zero because the deviations above the mean will cancel the deviations below the mean. For that reason, we find it more useful to find the Variance; the sum of the squares of the deviations:  ( x  x )2 Variance  n  8.2    3.2    3.2    2.2    2.2    0.8   0.8  1.8  1.8   4.8   7.8  2 2 2 2 2 2 2 2 2 2 12 189.68   15.81 12 Notice that the effect of squaring each difference is to produce a positive result. The Standard Deviation(S.D. or  is the positive square root of the variance; S .D.    Variance   (x  x ) 2 n  15.81  3.98 Example:Two Math 30B classes scored the following marks on an exam; Class A: 63, 73, 77, 44, 76, 89, 56, 23, 81, 52, 67, 60, 84, 65, 73, 57, 66, 85, 75, 73, 72 Class B: 63, 78, 55, 76, 81, 66, 92, 83, 78, 77, 81, 73, 81, 72, 51, 54, 62, 62, 63, 77 a. Find the mean, median and mode for each class. b. Find the range for each class. c. Find the standard deviation from each class. d. Using your knowledge of statistics, comment about the class performance -27- 2 Unit 2: Data Analysis Solution:To set up the standard deviation it is best to use a table. Class A xx x n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Total Average Mark 23 44 52 56 57 60 63 65 66 67 72 73 73 73 75 76 77 81 84 85 89 1411 67.19 Deviation -44.19 -23.19 -15.19 -11.19 -10.19 -7.19 -4.19 -2.19 -1.19 -0.19 4.81 5.81 5.81 5.81 7.81 8.81 9.81 13.81 16.81 17.81 21.81 0 0 x  x 2 Class B xx x  x 51 54 55 62 62 63 63 66 72 73 76 77 77 78 78 81 81 81 83 92 Deviation -20.25 -17.25 -16.25 -9.25 -9.25 -8.25 -8.25 -5.25 0.75 1.75 4.75 5.75 5.75 6.75 6.75 9.75 9.75 9.75 11.75 20.75 Variation 410.06 297.56 264.06 85.56 85.56 68.06 68.06 27.56 0.56 3.06 22.56 33.06 33.06 45.56 45.56 95.06 95.06 95.06 138.06 430.56 1425 71.25 0 0 2343.75 117.19 x Variation 1952.76 537.78 230.74 125.22 103.84 51.70 17.56 4.80 1.42 0.04 23.14 33.76 33.76 33.76 61.00 77.62 96.24 190.72 282.58 317.20 475.68 4651.24 221.49 Mark 2 Class A:  ( x) Mean  n 1411   67.19 21 Median = 72 Mode= 73 Range = 89-23 = 66 Class B:  ( x) Mean  n 1425   71.25 20 Median 73  76  74.5 = 2 Mode = 81 Range = 92-51 = 41 a. Class A: x = 67.19, Median = 72, Mode = 73 Class B: x = 71.25, Median = 74.5, Mode = 81 b. Class A: Range = 66, Class B: Range = 41 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2 2.5 c. Class A: S.D. =    (x  x ) Class B: S.D. =   x  x) 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2  (2.5 2 n n  4651.24  14.88 21  2343.75  10.83 20 2 d. Class A shows a wider range of abilities than class B. Either the students in class A are inconsistent in their work or attendance, or the teacher of class B is better at conveying the material. In both classes the data is negatively skewed because a few students did poorly. Class A Marks Frequency Class B Marks Frequency 2.5 2 1.5 1 0.5 2.5 2 1.5 1 0.5 2030405060708090100 Mark 2030405060708090100 Mark -28- Unit 2: Data Analysis TI-82 Calculator: To find the variance or standard deviation on the calculator, enter the list of data as described on page 24. Press [STAT][] to access the calc menu. Select 1:1-Var Stats. Enter the name of the list that you wish to obtain stats for (L1 to L6) and press [ENTER]. The calculator will then x= display; the mean for the list,  x = ...........the sum of all data points, x 2 = ........the sum of the squares of the data points, Sx = .............the standard deviation for a sample,  x = .............the standard deviation for a population (the one we want), n= ..............the number of data points, minX= ........the minimum value, Q1= .............the first quartile, Med=...........the median, Q3= .............the third quartile, and maxX= .......the maximum value. You will have to use the up and down arrows to see all of the information. Practice Questions 2: Standard Deviation 1. Boards from a rail shipment are selected at random and measured. The following measures (in meters) are obtained: 4.01, 3.96, 4.05, 3.92, 3.95, 3.98, 4.08, 4.03, 4.03, 3.98 To pass inspection,the boards’ average length must be within 1 cm of 4.0 m, and the standard deviation must not exceed 0.048 m. Does this shipment pass inspection? 2. The number of hours on different days that a machine is in operation is given below: 12 h, 18 h, 15 h, 6 h, 4 h, 17 h, 10 h, 13 h, 10 h, 7 h, 16 h, 11 h, 4 h a. Calculate the mean. b. Find the standard deviation 3. A commuter recorded the number of minutes spent waiting for a bus on each working day for two weeks:2, 4, 13, 5, 8, 5, 7, 11, 7, 8 a. Calculate the mean. b. Find the standard deviation 4. Because of water shortages during a drought in a Canadian city, watering of lawns and gardens was permitted only from 06:00 to 09:00 and 19:00 to 22:00. The peak consumption was recorded (as a percentage of total capacity) each watering period for a week in July: 88, 95, 65, 94, 67, 94, 75, 93, 77, 100, 85, 100, 85, 100 a. Calculate the mean. b. Find the standard deviation c. Water pressure problems occur when 95% of capacity is reached, on what percentage of days did this happen? 5. Determine the standard deviation for the following sets of data. a. Value 3 4 5 6 7 Frequency 2 8 9 6 3 b. -29- Value 8 10 12 16 25 Frequency 2 3 5 4 1 3SD = 68.26% 2SD 1SD 99.74% 95.44% Unit 2: Data Analysis Standard Deviation and Normal Distribution When data with a normal distribution is plotted on a graph in a frequency distribution it forms the familiar Bell Curve with the mean located at the highest point of the curve. In the following graph of a normal distribution the graph is divided into standard deviations on either side of the mean. Normal Distribution Relative Frequency 3SD = 99.74% 2SD = 95.44% 1SD = 68.26% 34.13% 34.13% 2.15% 2.15% 13.59% -3 SD -2 SD 13.59% -1 SD x +1 SD +2 SD +3 SD This graph has the properties that: • 68.26% of the data is within 1 standard deviation of the mean. -The area between the mean and 1 S.D. will hold 1/2 of 68.26% or 34.13% -The area between the mean and -1 S.D. will also hold 34.13% of the data. • 95.44% of the data is within 2 standard deviations of the mean. -The area between 1 S.D. and 2 S.D. will hold 13.59% of the data. -The area between -1 S.D. and -2 S.D. will hold 13.59% of the data. • 99.74% of the data is located within 3 standard deviations of the mean. -The area between 2 S.D. and 3 S.D. will hold 2.15% of the data. -The area between -2 S.D. and -3 S.D. will hold 2.15% of the data. Example: Mr. Krusties Cookies are randomly sampled to see how many chocolate chips each cookie contains. According to the findings, the mean number of chips per cookie is 7.3 with a standard deviation of 2.3. If we assume that the entire sample falls within a normal distribution find: a. The percentage of cookies with more than 7.3 chips. b. The percentage of chips with fewer than 5 chips. c. The percentage of cookies with more than 2.7 chips and fewer than 11.9 chips. Solution: a. Because the normal distribution is symmetrical about the mean, exactly 50% of the cookies will have more than 7.3 chips. 1 2 or b. Those cookies with fewer than 5 chips are more than 1 S.D less then the mean. Since 50% of the cookies are greater than the mean and 34.17% of the cookies are between the mean and -1 S.D. then 100% -(50% + 34.17%) =15.83% of the cookies have fewer than 5 chips. -30- Unit 2: Data Analysis c. The percentage of cookies with more than 2.7 chips and less than 11.9 chips are those within 2 S.D. of the mean. That would be 95.44%. Example: In a recent provincial exam, the marks for 30B Math had an average of 62% with a standard deviation of 6%. If 2500 students wrote the exam: a. How many students scored above 62%? b. How many students failed the exam? c. How many students scored higher than 80%? Solution: To calculate % of a number, convert the percentage to a decimal and multiply. a. If 62% is the mean score then half or 50% of the students scored higher. 50 % of 2500 = 0.50  2500 = 1250. 1250 students scored above 62% b. 50% is 2 S.D. below the mean score, the number of students below that would be: 100% - (13.59% + 34.13% + 50%) = 2.28%. 2.28% of 2500 = 0.0228  2500 = 57 students. 57 students failed the exam. c. 80% is 2 S.D. above the mean score, the number of students above that would be: 100% - (13.59% + 34.13% + 50%) = 2.28%. 2.28% of 2500 = 0.0228  2500 = 57 students. 57 students scored above 80%. NB. This is the same answer as for b. Practice Questions 3. Standard Deviation and Normal Distributions. 1. The mean score on a math exam was 65 and the standard deviation was 10. If the data is normally distributed: a. What percentage of students scored between 55 and 75? b. What percentage of students scored between 45 and 85? c. if 60 students wrote the exam, how many scored between 55 and 75? 2. Consumer testing has shown that the life of a hair dryer under daily use averages 6.5 years. The data is normally distributed with a standard deviation of 1.5 years. If a retail store sells 5000 of the hair dryers with a 2 year guarantee, how many will they have to replace? 3. The mean mass of game fish in lake Magalloway is determined to be 2.5 kg with a standard deviation of 0.75 kg. There are about 1000 fish in the lake: a. How many are between 1 kg and 4 kg? b. If fish with a mass of less than 1.75 kg must be released, how many of the fish in the lake must be thrown back? 4. An I.Q. test was given to all members of the armed forces. The results were normally distributed with a mean of 110 and a standard deviation of 15. a. What percentage of scores were above 125? b. What percentage of scores were below 80? c. If 75 000 personnel took the test, how many scored above 140? d. How many scored between 110 and 125? -31- Unit 2: Data Analysis 5. An egg producer has determined that the mass of eggs produced by his chickens averages 150 g with a standard deviation of 12.5 g. In his daily production of about 1440 eggs: a. How many mass more than 175 g? b. How many mass between 137.5 g and 162. 5 g? 6. The time required to register for university is found to average 33 minutes with a S.D. of 6 minutes. What percentage of registrations will last; a. more than 45 minutes? b. Less than 27 minutes? c. between 21 and 39 minutes? Z-Scores and Data Analysis Objectives: B.4 To define and calculate z-scores using the formula z  xx  or xx S .D. B.5 To be able to use z-scores as an aid in interpreting data. B.6 To solve real-world problems using statistical inference. Notes: In order to compare different scores within one set of data or to compare scores from different sets of data with different means and standard deviations we can use z-scores. For any value (x) in a set of data, the z-score, (z), can be determined by subtracting the mean ( x ) from the value, (x), and dividing by the standard deviation ( or S.D.) xx xx z or z   S .D. The z gives the distance from the mean as a multiple of the standard deviation. Example: 16 students entered a free throw competition. Each competitor attempted 20 shots. Their scored were as follows: 6, 9, 7, 8, 10, 15, 7, 11, 9, 14, 12, 11, 13, 10, 11, 7 Determine the z-score for Ryan who scored 7 and for Janet who scored 15. Solution: Calculate the mean; x  x  (6  9  n  7) 16  160  10 16 Find the standard deviation;   (x  x ) n 2  (6  10)2  (9  10)2  16 Ryan’s z-score; z Janet’s z-score; z xx  xx   (7  10)2  7  10 3   1.17 2.57 2.57  15  10 5   1.95 2.57 2.57 -32-  106  2.57 16 Unit 2: Data Analysis From the example above we can see that Ryan was 1.17 standard deviations below the average and Janet was 1.95 standard deviations above average. We can use z scores and the table for areas under the standard normal distribution curve to calculate percentiles or probabilities: Table 2.1 on the next page lists the percentage of scores between the mean ( x ) and the score that is z standard deviations from the mean. x For example a person with a z-score of 1.95; (see circled number on table 2.1) We look on the row that is labeled 1.9 in the column labeled 0.05 for the value 0.4744  The score is 37.08 % above the mean or the score is better than 50% + 47.44% = 97.44% Table 2.1: Area Under Normal Distribution Curve. x z z-score 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 O.00 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2258 0.2580 0.2881 0.3159 0.01 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2612 0.2910 0.3186 0.02 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.03 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2976 0.3238 0.04 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.2389 0.2704 0.2996 0.3264 0.05 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.06 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.07 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.08 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2518 0.2823 0.3106 0.3365 0.09 0.0359 0.0754 0.1141 0.1517 0.1879 0.2224 0.2459 0.2852 0.3133 0.3389 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 0.3413 0.3643 0.3849 0.4302 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.3461 0.3686 0.3888 0.4066 0.4222 0.4356 0.4474 0.4574 0.4656 0.4726 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.3577 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.3599 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.3621 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4778 0.4826 0.4864 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.4783 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4982 0.4788 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.4793 0.4838 0.4875 0.4904 0.4927 0.4945 0.4959 0.4969 0.4977 0.4984 0.4798 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.4803 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.4808 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4979 0.4985 0.4812 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.4817 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 0.4987 0.4990 0.4993 0.4995 0.4997 0.4998 0.4998 0.4999 0.4999 0.5000 0.4987 0.4991 0.4993 0.4995 0.4997 0.4998 0.4998 0.4999 0.4999 0.5000 0.4987 0.4991 0.4994 0.4995 0.4997 0.4998 0.4999 0.4999 0.4999 0.5000 0.4988 0.4991 0.4994 0.4996 0.4997 0.4998 0.4999 0.4999 0.4999 0.5000 0.4988 0.4992 0.4994 0.4996 0.4997 0.4998 0.4999 0.4999 0.4999 0.5000 0.4989 0.4992 0.4994 0.4996 0.4997 0.4998 0.4999 0.4999 0.4999 0.5000 0.4989 0.4992 0.4994 0.4996 0.4997 0.4998 0.4999 0.4999 0.4999 0.5000 0.4989 0.4992 0.4995 0.4996 0.4997 0.4998 0.4999 0.4999 0.4999 0.5000 0.4990 0.4993 0.4995 0.4996 0.4997 0.4998 0.4999 0.4999 0.4999 0.5000 0.4990 0.4993 0.4995 0.4997 0.4998 0.4998 0.4999 0.4999 0.4999 0.5000 -33- Unit 2: Data Analysis -34- Unit 2: Data Analysis Z-score questions come in two varieties: In the first type we are asked to find out which score is better as compared to the mean in two data sets. Example: The annual daily mean temperature in Saskatoon is 3.5_C with a standard deviation of 6.75_C. The annual daily mean temperature in Regina is 3.1_C with a standard deviation of 10.6_C. One spring day, the temperature is 11_C in Saskatoon and 12_C in Regina. Which city experienced a better than average day with respect to the mean temperature? Solution:Calculate the z-score for each city; x  x 12  3.1 8.9 x  x 11  3.5 7.5    0.84 Saskatoon- z     1.11 Regina- z   10.6 10.6  6.75 6.75 The temperature in Regina was only 0.84 standard deviations above the mean while Saskatoon was 1.11 standard deviations above the mean. Saskatoon was having a better day. In the second type we may be asked to use the table of the area under the normal distribution curve to calculate the probability of a certain score. The curriculum guide is unclear about whether or not this kind of question should be included. x Example 1: Determine the probability of the following event using z-scores. a. greater than z = -2.1 b. Between z = -1.3 and z = 1.3 Solution: a. From table 2.1, we see that the area between z = -2.1 and the mean is 0.4821. this corresponds to 48.21% probability between -2.1 S.D and the mean.(see graph) x z=-2.1 x The probability is, therefore, 48.21% + 50% = 98.21%. Remember that the area above the mean equals exactly half or 50% of the scores. b. From table 2.1 we see that the area between z = 1.3 and the mean is 0.4032. The area between z = -1.3 and the mean will also be 0.4032. (see graph) z=-1.3 x z=1.3 The probability is 40.32% + 40.32% = 80.64%. Example 2: The manufacturer of watch batteries uses random testing to determine the life of the batteries. They discovered that they would last 2.3 years on average with a standard deviation of 0.79. What is the probability that a consumer can get a battery that dies in less than 1 year? Solution: The z-score for a battery that lasts one year is; x  x 1  2.3 1.3 z    1.65  0.79 0.79 -35- x Unit 2: Data Analysis From table 2.1 we see that the area between z=1.65 and the mean is 0.4505. From the diagram below we can see that the probability is 50% - 45.05% =4.95% z=-1.63 x Practice Questions 4: z-scores and Standard Distributions 1. A on a recent Biology 30 final exam, the provincial average was 68.3% with a standard deviation of 11.3% Calculate the z-scores for the following students: a. Mark who earned 81% c. Leslie who scored 71% b. Freddie who earned 54% d. Annette who scored 92% 2. Calculate the percentile for Freddie and Annette from question 1. (Percentile is the percentage of students who scored the same or worse.) 3. In the last season the LittleRock Rockers Hockey team scored the following number of goals per game: 3, 7, 6, 2, 9, 5, 11, 3, 5, 6, 6, 9, 2, 1, 0, 6, 5, 7, 4, 5, 2, 3, 7, 6, 2, 1 a. Calculate the range, mean median and mode for this data. b. Calculate the standard deviation for this data. c. Find the z-score for the playoffs where they scored 6 goals, 2 goals and 1 goal. d. What was the z-score for their best game? 4. For the first three units of Mathematics B30, Elaine’s marks were 81%, 69%, and 73% respectively. The mean and standard deviation of the unit marks in her class were; Unit I: x = 75% Unit II: x = 63% Unit III: x = 74%  = 8%  = 7%  = 8% Compared to the rest of the students taking the course, in which unit did Elaine do the best? 5. Find the probability of he following; a. a z-score of more than 2.1 c. a z-score between -2.5 and +1.3 b. a z-score of more than -1.5 6. Consumer reports show that one brand of compact cars is able to travel an average of 120000 km before brake service is required. If the standard deviation is 15000 km, what is the probability that a car will need brake servicing before 100000 km? 7. If standardized testing shows that Canadians have an average I.Q. of 100 with a standard deviation of 15, how many Canadians (out of 30 000 000) would be likely to have an I. Q. higher than 150? -36- Unit 2: Data Analysis 8. In a recent study, a science student tested a new type of plant food on Geranium plants. He collected the following data; Group A growth; 8.1 cm, 7.3 cm, 2.5 cm, 9.3 cm, 7.1 cm, 7.5 cm, 8.0 cm, 7.6 cm, 6.9 cm, 7.5 cm. Group B growth; 4.3 cm, 5.2 cm, 6.5 cm, 4.9 cm, 5.5 cm, 5.9 cm, 6.8 cm, 4.0 cm, 9.0 cm, 5.7 cm, 6.2 cm. a. Calculate the mean, median, and range for each data set. b. Are the data skewed? c. Find the standard deviation for each data set. d. Calculate the z-score for the greatest and least growth in each data set. e. At the end of the study, the student found one plant with no label. If its growth was 6.0 cm, which data set is it most likely to belong to? (lowest z-score) f. If Group A used the new plant food, what conclusions can the student make? -37- 8 10 1 2 3 4 5 6 8 2: Data Analysis 10 Unit64212 Unit 2 Solutions Practice Questions 1: page 24 1. Range = 49, Mean = 69.63, Median = 69.5, no mode, This is very close to normal but skewed slightly right. 2. a. Saturday Frequency 6 5 4 3 2 1 5 4 3 2 1 2 4 6 8 10 12 Score 3. 4. 5. 6. Sunday Frequency Monday Frequency 6 4 2 2 4 6 8 10 Score 2 4 6 8 10 12 Score b. Sat: Range = 9, Mean = 7.35, Median = 8, no Mode c. Skewed - left Sun: Range = 8, Mean = 8.3, Median = 6, Mode = 6 Skewed - right Mon: Range = 6, Mean = 4.2, Median = 4, Mode = 5 Not skewed (much) d. Varies. a. Range = 40, Mean = 50.47, Median = 52, Mode = 52 b. Skewed very slightly to the left because the mean is less than the median. a. Range = 19.6%, Mean = 0.415%, Median = 1.6%, Mode = 1.6% b. i) Premier prefers Median or mode because it sounds more prosperous. ii) Opposition prefers Mean because it sounds worse. a. Range = 95%, Mean = 56.7%, Median = 74%, no Mode b. Varies Mean is 17.73 years. 7.She is the median because she is the middle runner. Practice Questions 2: Page 28 1.  = 0.047, Yes the shipment passes. 2. a. Mean = 11 h b.  = 4.57 h 3. a. Mean = 7 min b.  = 3.10 min 4. a. Mean = 87% b.  = 11.47% c. Reached 100% on 4/7 or 42% of days. 5. a.  = 1.10 b.  = 4.18 Practice Questions 3: Page 30 1. a. 68.26% b. 95.44% c. 41 students 2. 0.13% of 5000 = 6.5dryers replaced. 3. a. 95.4% = 954 fish b. 15.87% = 159 fish 4. a. 15.87% b. 2.28% c. 2.28% of 75000 = 1,710 d. 34.13% of 75 000 = 25,598 5. a. 2.28% of 1440 = 33 eggs b. 68.26% of 1440 = 983 6. a. 2.28% b. 15.87% c. 81.85% Practice Questions 4: Page 34 1. a. 1.5929 b. -1.2655 c. 0.2389 d. 2.0973 2. Freddie 10 th Annette 98 th 3. a. Range = 11, Mean = 4.73, Median = 5, Mode = 6 b.  = 2.70 c. z1 = 0.47, z2 = -1.01, z3 = -1.38 d. 2.32 4. Unit I -  = 0.75, Unit II -  = 0.86, Unit III -  = -0.13, She did best on Unit II 5. a. 1.8% b. 93.32% c. 92.4% 6. z = -1.3 6.98% 7. z = 3.33, 0.05% = 1,500 people 8. a. Group A: Mean = 7.25 cm, Median = 7.5 cm, Range = 6.8 cm Group B: Mean = 5.81 cm, Median = 5.7 cm, Range = 5 cm b. Group A skewed slightly left. Group B skewed very slightly right. c. Group A:  = 1.62, Group B:  = 1.30 -38- Unit 2: Data Analysis d. Group A: Most z = 1.27, Least z = -2.93 Group B: Most z = 2.45, Least z = -1.39 e. zA = -0.77, zB = 0.146 The plant probably belongs to group B. f. The student can conclude that the new plant food is effective. -39- 8 10 2 4 6 8 10 2 4 6 Unit 2: Data Analysis Unit 2: Review 1. Statistics is the branch of mathematics concerned with manipulating groups of numerical facts so as to present significant information about the subject or source of the data. 2. Lists of data may be represented in tables, Histograms, or Frequency Distributions: Histogram Frequency Distribution Frequency Frequency 6 4 2 6 4 2 2 4 6 Domain 8 10 2 4 6 8 Domain 10 In each case, the vertical axis represents the number of scores at each value in the domain. 3. Measures of central tendency include; Mean - the mathematical average of the data. Median - the middle value in the data when listed in order. If there are an even number of data points, the median is the average of the two middle values. Mode - The value that appears most often. 4. In a Normal Distribution of data, the shape of a frequency distribution graph is a bell shaped curve. The mean, median and mode are all located at the center or highest point of the curve. 5. In a Skewed distribution, the mean and median have different values. Skewed to the right or Positively Skewed data has the mean greater than the median. Skewed to the left or Negatively Skewed data has the mean less then the median. 6. Measures of deviation include; Range - the difference between the maximum and minimum values. Variance - the average of the squares of the deviations from the mean:  x  individual values ( x  x )2   Variance  Where  x  the mean n n  the number of values  Standard Deviation (S.D. or ) - the square root of the variance:  x  individual values ( x  x )2    Where  x  the mean n n  the number of values  -40- Unit 2: Data Analysis 7. The area under the curve of a Normal Distribution represents the probability of a score falling in that area of the data. It is distributed so that exactly 50% of the data falls above the mean and 50% below the mean. If the domain is divided into standard deviations on either side of the mean, the percentage of data falling in each section is as follows: • 68.26% of the data is within 1 standard deviation of the mean. -The area between the mean and 1 S.D. will hold 1/2 of 68.26% or 34.13% -The area between the mean and -1 S.D. will also hold 34.13% of the data. • 95.44% of the data is within 2 standard deviations of the mean. -The area between 1 S.D. and 2 S.D. will hold 13.59% of the data. 3SD = 68.26% 2SD 1SD 99.74% 95.44% -The area between -1 S.D. and -2 S.D. will hold 13.59% of the data. • 99.74% of the data is located within 3 standard deviations of the mean. -The area between 2 S.D. and 3 S.D. will hold 2.15% of the data. -The area between -2 S.D. and -3 S.D. will hold 2.15% of the data. This figure will be provided for every exam. Normal Distribution Relative Frequency 3SD = 99.74% 2SD = 95.44% 1SD = 68.26% 34.13% 34.13% 2.15% 2.15% 13.59% -3 SD -2 SD 13.59% -1 SD +1 SD x +2 SD +3 SD 8. Data from different sets can be compared using z-scores;  x  the individual value xx  z , Where  x  the mean    the standard deviation  Z-scores represent the number of standard deviations that a value falls away from the mean. A table of areas under the normal distribution curve (Table 2.1, Page 32) can be used to find the probability or the percentage as a decimal of the scores thet lie between a particular value and the mean. If z-scores are to be used to find probabilities a table will be provided. N.B. There is a second method for calculating not used in this course. The Standard deviation for a Sample is; s  (x  x ) n 1 -41- 2 r Unit 2: Data Analysis Unit 2 Review Questions 1. A Television manufacturer advertises that the mean life of picture tubes in new TV sets is 10,000 h with a standard deviation of 1000 h. A local hotel has purchased 200 of the TV sets. If we assume a normal distribution; a. What percentage of the picture tubes should last more than 11,000 h? b. What percentage of the picture tubes should last less than 8,000 h? c. How many TVs should need to be repaired or replaced in the first 7,000 h? 2. A machine used to package candies in 90 g packages is thought to be faulty. A sample of 10 packages is randomly selected and their actual masses in grams are; 86, 91, 89, 88, 92, 90, 93, 90, 91, 90 a. Find the range, mean, median and mode. b. Calculate the standard deviation. c. If the machine is supposed to be within a standard deviation of 1.3 g, does this one need repairs? 3. Scores on an I.Q. test are normally distributed with a mean of 100 and a standard deviation of 15. How many students in a group of 750 would you expect to have an I.Q. higher than 130? 4. The life of a houshold blender is normally distributed with a mean of 7 years and a standard deviation of 1.5 years. What is the probability that a blender will last; a. longer than 8.5 years b. less than 5.5 years. c. between 4 and 5.5 years? 5. The ages of stunt persons are normally distributed with a mean of 28 years and a standard deviation of 1.5 years. a. What is the probability that a stunt person is over 31 years old. b. Nine fingered Mary is 39 years old. What is her z score? c. What is the probability that Mary will still be a stunt person in 5 years? 6. The mean height of North American women is 162 cm with a standard deviation of 4 cm. In a group of 500 randomly chosen women, how many would you expect to be over 166 cm tall? 7. The mean reaction time taken to apply the brakes on a car is 0.75 s with a standard deviation of 0.05 s. a. Calculate Edna’s z-score if her reaction time is 0.65 s. b. What percentage of the population is faster than Edna? c. Find the probability that a person has a reaction time between 0.70 s and 0.85 s. 8. Calculate the Standard deviation for the following sets of data. a. 20, 24, 27, 29, 25, 21, 22 b. 6, 8, 10, 12, 9, 7, 7 9. The mean waiting time at a bank is 5 min with a standard deviation of 1.25 min. a. What is the z-score for a person who waits 2.5 min? b. What is the probability that a customer will wait between 3.75 min and 6.25 min? -42- Unit 2: Data Analysis 10. The total points scored in the Hyper Bowl final in each of the last five years were; 54, 50, 64, 35, and 47. Calculate the standard deviation. 11. A company claims that their aerosol cans each contain 175 g of spray disinfectant. The can will not spray properly if the cans are more than 15 g over full, and the customer is cheated if the cans are not full. A random sample of cans from one shipment were tested and found to have the following weights; 175, 175, 179, 175, 175, 159, 180, 175, 177, 180. a. Find the Standard deviation for the sample. b. If 95% of the shipment is supposed to be within 2 standard deviations of the mean, should this one be accepted? Unit 2 Review Solutions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. a. 16% b. 2.5% c. 0.26 TV’s (none) a. Mean = 90 g b.  = 2 g c. Yes 17 Students a. 15.7% = 0.157 b. 15.7% = 0.157 c. 13.59% = 0.1359 2.28% = 0.0228 80 a. z = -2 b. 2.28% c. 81.85% = 0.8185 a.  = 3.3 b.  = 2.1 a. z = -2 b. 68.26% = 0.6826  = 10.56 a.  = 6.0 g b. No. -43-

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Unit 2: Data Analysis