Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NZ-SIMSS 2008 Parallel NCEA Scholarship Statistics and Modelling Exam Answers All questions are based on activities associated with a company called PlayGames. (1) (a) PlayGames needs to estimate within one minute, with 95% confidence, the mean time per player to finish the game OBE, where players have to stop a monster called OBE from enslaving a country. It has been noted that the standard deviation of any group of players is about 3 minutes. Find the minimum sample size for this estimate. This condition will be fulfilled if the margin of error of a 95% confidence interval for the mean time is less than 1. ………………………..1 Hence 1.96x3÷√n < 1……………………………………………..2 If this is solved then n > 35………………………………3 (b) PlayGames operates a parlour which has a maximum number of 50 players. Usually if all of these players are playing OBE the mean time to play the game till its conclusion is 6.8 minutes. A sample of the size calculated in (a) had a mean of 5.7 minutes. Is this what might be expected ? Give reasons. The 95% confidence interval has end points 5.7 ± 1.96x3÷√35 giving the interval (4.71, 6.70). …………………………………………………1 As 6.8 does not belong to this interval we can be 95% confident that the sample of 35 players was more skillful than average……………………2 (c) PlayGames wants to compare mean times of customers who come in on week day evenings and customers who come in over the weekend. Explain how this might be done. Samples of size more than 30 are randomly selected from the the customers who come during the week and from the customers who come at the weekend. ……….1 Let these sample sizes be n1 and n2 respectively. For each sample the sample means m1 and m2 are calculated, as are the sample standard deviations s1 and s2. From these parameters a 95% confidence interval is constructed whose end points are m1 – m2 ± 1.96√(s12/n1 + s22/n2). ……………………………………………………2 If 0 is in the interval then there is no significant difference between the mean times otherwise there is ……………………………………………………………………….3 © Rory Barrett 2008 (2) (a) A customer satisfaction survey using a cluster sample was carried out . Discuss this and give an advantage and a disadvantage. The customers are divided into groups which are presumed to have the same characteristics as the population …………………………………………….1 then after one or more of these groups are chosen randomly sampling is done entirely within this(these) group(s)…………………………………………..2 An advantage is that it is often more convenient to sample within only one or a few groups rather than the population as a whole………………………………..3 A disadvantage is that the presumption of the groups having the same characteristics may be wrong and subsequent sampling will be biased…….4 (b) A survey of 50 customers gave 80% satisfaction rate with the games offered. Use this to find a 99% confidence interval for the proportion of all customers who are satisfied with the games offered. The confidence interval will have end points 0.8 ± 2.576 x √(0.8 x 0.2 ÷ 50) which is 0.6543 < π < 0.9457, where π is the population proportion, or (0.6543,0.9457)……………….1 (c) Many of the customers in the survey hate the game SBA where a giant spider eats teenagers, as they are frightened of spiders. If 1000 customers were asked whether they hated SBA what is the probability more than 940 would hate SBA if the probability customers hated SBA was the maximum probability suggested by the 99% confidence interval based on the survey in (2)(b) above ? The number of people in 1000 who hate SBA is a B(1000,0.9547) distribution…1 Let X be the number of people in 1000 who hate SBA. We need to find P(X > 941). Attempting to find this using a CFX9750 leads to a MA Error. μX = 954.7, σX = √(1000 x 0.9457 x (1-0.9457)) = 7.166……………………2 Let X* be a normal approximation to X with μX = 954.7, σX = 7.166. Using this we need to find P(X* > 940.5), correction for continuity, = 0.97623……………………….3 © Rory Barrett 2008 (3) PlayGames records the number of customers over a thirty day period. The table below shows some of this data. PlayGames records the number of customers over a thirty day period. The table below shows some of this data. Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Number of customers 57 43 42 31 73 183 105 64 52 49 29 80 188 191 63 44 41 30 81 195 181 66 56 55 24 78 201 178 70 60 Deseasonalised Numbers of customers 87.8 84.0 81.1 34.5 87.8 89.7 89.8 85.8 91.0 86.1 120.5 86.8 81.7 81.8 86.8 92.0 93.1 110.5 89.8 93.7 95.8 80.8 89.0 99.1 There ia also a graph below which shows the actual data and the deseasonalised data. © Rory Barrett 2008 Customers Original Deseasonalised 250 Numbers 200 150 100 50 0 0 5 10 15 20 25 30 35 Days (a) Write a short essay describing the customer numbers over this period. In this essay explain how the deseasonalised values were obtained ; There is a slight increasing trend over the period surveyed, particularly at weekends. ……………………………………………………………………………1 Customer numbers peak over the weekends and are least on Thursdays. The deseasonalised data shows this as well………………………………………………2 The deseasonalised data is obtained by first finding the centred moving means(CMM) of order 7. The ‘Rough’, the difference between the raw and the CMM are then averaged for each day to find ‘Average Seasonal Effects’……….3 The deseasonalised data is found by subtracting the ‘Average Seasonal Effects’ from thm the raw data………………………………………………………………………4 (b) A trend line is fitted to the deseasonalised dats whose equation is y = 0.7x +77. The table below shows the average ‘seasonal effects’ for each day of the week. Day of week Average ‘seasonal effect’ -24 1 -38 2 -41 3 -57 4 -11 5 102 6 71 7 Use these to predict the customer numbers on day 47 if the features and trend shown continue; Day 47 is ‘Day 5’ when it is necessary to use average seasonal effects. The prediction is 0.7(47) + 77 – 11 = 99…………………………………………….2 © Rory Barrett 2008 (c) Playgames will lease new premises if the average daily number of customers exceeds 150. When is this likely to occur and what would be the prediction of the actual number of customers on that day ? The average daily number of customers is represented by the trend equation thus 0.7x + 77 = 150 and x = 105 days………………………………………………....1 That day is a Sunday hence the ‘average seasonal effect’ is 71 hence the prediction would be .7 x 105 + 77 + 71 = 221.5 hence expect 221 or 225 people………………..2 (4) The manager of PlayGames is interested in how long it took experienced players of SBA and OBE to win points in these games. He got a group of 20 very experienced players and recorded how many points the members of the group won each minute. No player needed more than ten minutes to complete either game. The data he acquired is shown in the table below. Each game has 10 points which have to be won in order to finish. Minute 1 2 3 4 5 6 7 8 9 10 Points won in this minute OBE SBA 3 13 24 25 41 19 61 23 33 30 20 18 11 22 5 18 2 20 0 12 (a) He summarized this by finding the mean number of minutes for each point for each game and the standard deviation of number of minutes for each point for each game. Find the parameters he found and write down what they show The required parameters are shown in the table below…………………………..1 Parameter OBE SBA Sample Mean 4.19 minutes 5.34 minutes Sample Standard 1.58 minutes 2.65 minutes Deviation The values obtained in the table indicate that for experienced players SBA is a more difficult game than OBE. The reason for this is that it takes longer to score points in SBA than it does for OBE as shown by the mean………………………………….2 SBA has a higher standard deviation than OBE indicating there is more variation in the time required to win points in SBA than in OBE………………………….3 © Rory Barrett 2008 (b) OBE y = -1.5303x 2 + 13.658x + 3.8 R2 = 0.5705 70 60 50 40 30 20 10 0 -10 0 2 4 6 8 10 12 -20 SBA y = -0.4697x 2 + 4.7788x + 11.8 R2 = 0.4958 35 30 25 20 15 10 5 0 0 2 4 6 8 10 12 He also drew scatter plots of the number of points against minutes for each game. These are shown below as well as regression lines, equations and the associated coefficients of determination (R2 ). Write an essay about what these diagrams show. . The diagrams show scatter plots of the number of points won in one, two, three,.. minutes against the times one, two, three minutes. The most number of points was won in 4 minutes in OBE and in five minutes in SBA. © Rory Barrett 2008 The number of points won was smaller at the ends of the time interval 1 < time < 10 than in the middle of the interval for both games……………………………………..1 The data for both games has been given a trend line which is a quadratic function. The fit is measured by the coefficient of determination R2. It is a better fit for OBE (R2 = 0.5705 ) than it is for SBA (R2 = 0.4958 )……………2 The proposed model for OBE is unsatisfactory for two reasons. In the first four minutes of playing OBE there seems to be a linear relationship if we plot points won per minute against minute and thereafter a decay…………………………………….3 In addition the proposed model gives a negative value for 10 minutes which is impossible Although there is considerable scatter about the proposed model for SBA it is not obvious from looking at the graph that the model is flawed…………………………4 (c) How many points would you expect a group of 50 experienced players of OBE to win in the third minute of play ? If the original group players were playing then the number of points won in the third minute would be about 40 As there are 50 players rather than 20 then we would expect 50 x 40 ÷ 20 = 100 on average………………………………………………………………………………….1 (5) Playgames introduces a new game called CIE. CIE provides far more challenge. Most players are keen on it. The table below shows some information about the games Playgames offers. Game Percentage of customers Probability of a win within playing this game ten minutes of play starting OBE 28% 2p SBA 7% 4p CIE 65% p (a) Calculate the probability at least 14 of the next 30 customers play SBA; The number of players in 30 who play SBA has a B(30, 0.07) distribution……1 We require P(X > 14) where X is the number of players in 30 who play SBA. P(X > 14) = 1 - P(X < 13) = 1 – 0.99999 = 0.00001………………………………..2 (b) If the probability of a randomly selected player winning in ten minutes irrespective of the game played is 0.35 then find p; The probability that a randomly selected player wins in ten minutes is 0.28 x 2p + 0.07 x 4p + 0.65 x p …………………………………………………………………1 This is equal to 0.35…...……………………………………………………………..2 Solving this equation gives p = 0.2071………………………………………………3 (c) Find the probability that if a customer wins in ten minutes that they were playing OBE. The required probability is P(OBE|win within 10 minutes) ………………………1 = 0.28 x 2 x 0.2071 ÷ 0.35 …………………………………………………………….2 = 0.3314…………………………………………………………………………………3 © Rory Barrett 2008 (6) In a new games parlour Playgames only had two games machines, X and Y. Machines were either one or the other. An X machine cost $1000, a Y machine cost $800. The maximum number of machines was 50. Playgames allocated $48000 for the purchase of machines. For 0 < x < 20 then y < 40 . For 20 < x < 40 then y < 20 (a) If the ratio of revenue between an X machine and a Y machine is 12:10 then find the number of each type which should be purchased for maximum revenue; Constraints are: 1000x + 800y < 48000 x + y < 50………………………………………………………………..1 0 < x < 20 then y < 40 For 20 < x < 40 then y < 20 The diagram shows the feasible region……………………………….2 For maximum revenue x has to be as large as possible at a corner and so we need 40 X and 10 Y………………………………………………………………3 (b Due to changing economic conditions the amount of money allocated for machine purchase was reduced however all other constraints remained the same, as well as the revenue ratios for X and Y. How much was the budget for machine purchase reduced by if the number of X machines for maximum revenue is now 20 with the revenue being as large as possible; In this case we now have 20 X and 30 Y . ……………………………1 Thus the machine budget is 1000(20) + 800(30) = $44000. The budget has been reduced by $4000………………………………2 © Rory Barrett 2008 (c) As a result of still worsening economic conditions cheaper premises have to be leased and in the new premises the total number of machines is determined by floor space.. Each X machine uses 1m2 of space and each Y machine uses 2m2. A total of 70m2 is available for machines. If the ratio of revenue between an X machine and a Y machine is now 0.5:1 then find how many of type X there should be and of type Y.. The new constraint which replaces x + y < 50 is x + 2y < 70…………………1 The feasible region is shown below……………………………………………..2 This means that any combination of X and Y machines satisfying x + 2y = 70 with 0 < x < 30 will be satisfactory for the given ratio between X and Y………3 © Rory Barrett 2008