Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IB Higher Level Statistics Option with the TI83/TI84 family of calculators Tony Halsey www.mathspot.co.uk Page 1 Contents Chapter 1. Probability Distributions A. Discrete Probability Distributions Discrete Uniform Binomial Geometric Poisson Hypergeometric Negative Binomial B. Continuous Probability Distributions Continuous Uniform Normal Exponential Exercise 1 Chapter 2. Expectation Algebra Exercise 2 Chapter 3. The Central Limit Theorem. Exercise 3 Chapter 4. The Proportion of Success in a large sample and Significance Testing Exercise 4A Further Notes and terminology Exercise 4B Chapter 5. Confidence Intervals Exercise 5A Exercise 5B Chapter 6. The chi-squared goodness of fit test. More on Degrees of freedom Exercise 6 Chapter 7. The chi-squared test for independence Exercise 7 Practice Paper 1 Practice Paper 2 Answers Markscheme for Practice Paper 1 Markscheme for Practice Paper 2 Appendix B - Further Notes on the use of TI calculators Page number 2 2 2 3 4 5 6 7 8 8 9 10 11 13 18 19 21 22 28 29 31 33 37 38 39 40 42 43 45 47 48 49 55 58 62 Slide 1 10 17 20 40 47 53 Chapter 1. Probability Distributions There are many probability distributions, some of which are named as they occur frequently. Those that are needed for the HL course are listed below. A summary of those is found in Appendix A. A. Discrete Probability Distributions Discrete Uniform The probability of an event where each outcome is equally likely, follows a Discrete Uniform distribution. The parameter for the Discrete Uniform distribution is n, the number of equally likely scores. The notation used is DU n Example question A number is chosen at random from the natural numbers between 1 and 50. (a) What is the probability that a number over 30 is chosen? (b) What is the expected value of the number chosen and what is the variance of numbers chosen? Solution X is the number chosen X ~ DU 50 p X 30 E X 20 0.4 50 51 25.5 2 Var X 2499 208.25 12 n 1 Formula given E X 2 2 Formula given Var X n 1 12 € Tony Halsey, 2007. Unregistered Copy Page 2 Binomial The number of successes in a number of trials, where the probability of success on each trial is equal, follows a Binomial distribution. The parameters for the Binomial distribution are n, the number of trials, and p, the probability of success on each trial. The notation used is B n, p Example question A marksman fires at a target 10 times, with probability of hitting the target 9/10. (a) What is the probability that he hits the target exactly 8 times? (b) What is the probability that he hits the target more than 6 times? (c) What is the expected value of the number of hits and what is the variance of the number of hits? Solution X is the number of hits X ~ B 10, 0.9 p X 8 0.194 (from TI calculator ) p X 6 1 p X 6 0.987 (from TI calculator ) E X 10 0.9 9 Formula given E X np Formula given Var X npq Var X 10 0.9 0.1 0.9 € Tony Halsey, 2007. Unregistered Copy Page 3 Geometric The number of trials before a success is achieved, when the probability of success on each trial is the same, follows a Geometric distribution. The parameter for the Geometric distribution is p, the probability of success on each trial. The notation used is Geo p Example question A marksman fires at a target with probability of hitting the target 9/10 What is the probability that he hits the target for the first time on his second shot? What is the probability that he takes at least 5 shots to hit the target for the first time? What is the expected value of the number of shots before he hits the target and what is the variance of the number of shots before hitting the target? Solution X is the number of shots before he hits the target X ~ Geo 0.9 p X 2 0.09 (from TI calculator ) p X 5 1 p X 4 0.0001 (from TI calculator ) E X 10 1.11 9 Var X 0.1 0.123 0.92 Formula given E X 1 p Formula given Var X q 2 p € Tony Halsey, 2007. Unregistered Copy Page 4 Poisson The number of successes in an unknown number of trials, where the probability of success on each trial is equal, follows a Poisson distribution. The parameter for the Poisson distribution is m, the mean number of successes. The notation used is Po m Example question It is known that on average the number of people visiting a shop in an hour is 17. What is the probability that exactly 30 people visit the shop in a two hour period? What is the probability that less than 50 people visit the shop in the first three hours of opening? What is the expected number of people visiting the shop in an 8 hour day, and what is the variance of the number of people visiting? Solution X is the number of people visiting in 2 hours p X 30 0.0568 (from TI calculator Y is the number of people visiting in 3 hours p Y 50 p Y 49 0.426 X ~ Po 34 Y ~ Po 51 (from TI calculator Z is the number of people visiting in 8 hours ) ) Z ~ Po 136 E X 136 Var X 136 Formula given Var X m € Tony Halsey, 2007. Unregistered Copy Page 5 Hypergeometric The number of “successes” when taking a sample, without replacement, from a larger set, follows a Hypergeometric distribution. The parameters for the Hypergeometric distribution are n, the size of the sample, N, the size of the larger set and M, the number of successes in the larger set. The notation used is Hyp n, M , N Example question There are 20 students studying HL maths, of which 14 are girls. If 8 students from the HL classes are chosen at random to visit the maths department of a particular university, what is the probability that exactly 3 girls are chosen? What is the expected number of girls chosen in the group and what is the variance of the number of girls? Solution X is the number of girls in the group X ~ Hyp 8,14, 20 14 6 3 5 364 6 p X 3 0.0173 20 125970 8 Formula given p X x E X 8 14 5.6 20 Var X 8 M N M x n x N n Formula given E X np 14 6 12 504 1.06 20 20 19 475 N n Formula given Var X npq N 1 € Tony Halsey, 2007. Unregistered Copy Page 6 Negative Binomial The number of trials before r successes are achieved, when the probability of success on each trial is the same, follows a Negative Binomial distribution. The parameters for the Negative Binomial distribution are r, the number of successes and p, the probability of success on each trial. The notation used is NB r , p NB 1,p Geo p i.e. A Negative Binomial distribution with r 1 is [Note 1: a Geometric distribution] The probability that number of trials before r successes is x, can also [Note 2: found by finding the probability of r - 1 successes in x - 1 trials be (which is Binomial) multiplied by p, the probability of the rth success on the xth trial] Example question A marksman fires at a target with probability of hitting the target 9/10. What is the probability that he hits the target for the third time on his eighth shot? What is the expected value of the number of shots before he hits the target for the fifth time and what is the variance? Solution X is the number of shots before he hits the target for the third time X ~ NB 3, 0.9 p X 8 0.000153 (from TI calculator or use the formula given) Y is the number of shots before he hits the target for the fifth time Y ~ NB 5, 0.9 E X 50 5.56 9 Var X 5 0.1 0.617 0.92 Formula given E X r p Formula given Var X rq p 2 € Tony Halsey, 2007. Unregistered Copy Page 7 B. Continuous Probability Distributions Continuous Uniform The probability of choosing a value in a certain range where all outcomes are equally likely, follows a Continuous Uniform distribution. The parameters for the Continuous Uniform distribution are a and b, the upper and lower bounds for the possible outcomes. The notation used is U a, b Example question A number is chosen at random from the real numbers between 0 and 50. What is the probability that a number over 30 is chosen? What is the expected value of the number chosen and what is the variance of numbers chosen? Solution X is the number chosen X ~ U 0, 50 p X 30 E X 20 0.4 50 50 25 2 Var X 2500 208 12 a b Formula given E X 2 2 Formula given Var X b a 12 € Tony Halsey, 2007. Unregistered Copy Page 8 Normal The standard, bell shaped distribution! The parameters for the Normal distribution are and 2, the mean and variance of the distribution. 2 The notation used is N , . Also the STANDARDISED Normal distribution (with mean 0 and variance 1) can be found by X the standardising formula Z and we can use the notation p Z z z Example question 1 The lengths of worms are normally distributed with mean 7.2cm and standard deviation 2.3cm Find the probability that a worm chosen at random is greater than 8cm Solution X is the length of the worm X ~ N 7.2, 2.32 p X 8 0.364 (from TI calculator note infinity entered as a “big” number) Example question 2 Another species of worm also have their lengths normally distributed and 40% of worms are longer than 8cm and 20% less than 5cm. Find the mean and standard deviation of the lengths of these worms Solution Y is the length of the worm Y ~ N , 2 8 8 p Y 8 0.6 0.6 1 0.6 0.253347 5 5 p Y 5 0.2 0.2 1 0.2 0.841621 (from TI calculator ) 8 0.253347 and 5 0.841621 Solving (by hand or on GDC) we get = 7.31 and € Tony Halsey, 2007. Unregistered Copy = 2.74 Page 9 Exponential Most typically, the time taken before a success follows an Exponential distribution. The Exponential distribution can be thought of as the continuous form of the Geometric distribution. The parameter for the Exponential distribution is , the mean number of successes per time period. The notation used is Exp Example question It is known that on average the number of people visiting a shop in an hour is 18. If there is no-one in the shop at present, find the probability that it is more than 10 minutes before someone comes into the shop. Find the expected amount of time before someone comes into the shop and the variance of that time. Solution X is the amount of time (in minutes) before someone comes into the shop 18 (as X ~ Exp 0.3 0.3 people visiting each minute) 60 p X 10 10 0.3e 0.3xdx 0.0498 (from the formula given) 1 Formula given E X 1 Formula given Var X 2 10 3 100 Var X 9 E X Note 1: If using TI calculator to obtain an approximation, care must be taken in choosing a large value to use as infinity. See examples Note 2: The question could have been answered taking the amount of time in hours so 1 Y ~ Exp 18 and finding p Y 6 Note 3: Knowing the that the formula for the cumulative frequency distribution is p X x 1 e x can simplify such questions € Tony Halsey, 2007. Unregistered Copy Page 10 Exercise 1 1. During a national election 3 votes in every 5 voted for the winner, President X. The voters cast their votes independently. Find the probability that in a random sample of 5 voters exactly 3 voted for President X. 2. Hotel kitchens serve chocolate souffl€ on average one day in every sixteen. Assuming each day is equally likely to serve it, (a) what is the probability that hotel residents will have to wait more than 10 days before they serve chocolate souffl€ again? (b) what is the probability that the residents will have chocolate souffl€ for the fourth time on the 15th day? 3. A surfer knows from experience that he catches an average of 8 good waves in half an hour of surfing. If he spends 2 hours surfing one afternoon, what will be the probability that he catches more than 20 good waves? 4. Students sitting a maths test are known to have their finishing times Normally distributed. If 18% of students take more than 60 minutes to finish the test, and 5% of students finish within 45 minutes, what is the mean and standard deviation of the finishing times of students? 5. At a parent-teacher conference, parents arriving to talk to a particular teacher arrive, on average, 6.4 parents every hour. (a)If there is no-one talking to a teacher at a particular instant, what is the probability that there will be no-one talking to that teacher for the next ten minutes? (b)What is the probability that a teacher has to wait between 10 and 15 minutes for the first parent to arrive? One in twenty parents bring a gift for the teacher. (c) What is the probability that the teacher will still not have a gift after seeing 25 parents? (d)What is the probability that on seeing his 30th parent, he will receive his third gift? 6. A bag contains 5 blue balls and 7 red balls. (a)If a ball is chosen from the bag four times, replacing the ball each time, what is the probability that a red ball is chosen (i) exactly three times (ii) at least twice (b)If a ball is chosen from the bag six times, replacing the ball each time, what is the expected number of red balls? (c) If four balls are taken from the bag, without replacing any ball, what is the probability that: (i) exactly three red balls are taken (ii) at least two red balls are taken (d)If a six balls are taken from the bag, without replacing any ball, what is the expected number of red balls € Tony Halsey, 2007. Unregistered Copy Page 11 7. Poppies are randomly distributed around a field so that the mean number of poppies per hectare is . Students find that they are twice as likely to find exactly ten poppies as to find exactly nine poppies in a hectare of field. Calculate the value of . 8. A man looks at his watch at a random time between midday and 1300. What is the probability that the angle between the hands of the watch is less than 60 ? € Tony Halsey, 2007. Unregistered Copy Page 12 Chapter 2. Expectation Algebra We can combine probability distributions using certain rules. First let us consider an example. Three coins are tossed and the number of heads is recorded X is the number of heads X ~ B 3, 0.5 So the probabilities can be calculated x 0 1 2 3 p X x 1 8 3 8 3 8 1 8 We can calculate the expected value and variance of X to be E X np 3 21 1.5 , and Var X npq 3 12 12 0.75 In addition two dice are thrown and the number of sixes recorded Y is the number of sixes Y ~ B 2, 16 So again the probabilities can be calculated y 0 1 2 p Y y 25 36 10 36 1 36 And we can calculate the expected value and variance to be E Y 1 3 and Var Y 1 18 Now let us imagine that we are concerned with the total number of heads and sixes obtained in the two experiments. This total number of heads and sixes will have a new probability distribution. We can let Z be the total number of heads and sixes. We can see that Z X Y and we can find the probability distribution for Z. € Tony Halsey, 2007. Unregistered Copy Page 13 p Z 0 p X 0 Y 0 18 25 36 25 288 1 10 p Z 1 p X 1 Y 0 X 0 Y 1 83 25 36 8 36 Finally arriving at this distribution z x y 0 1 25 85 p Z z 288 288 2 3 4 5 106 288 58 288 13 288 1 288 85 288 etc. We can now calculate the expected value and variance of Z, Using the TI calculator So we calculate that E Z 11 6 and Var Z 37 36 At this stage we may notice that E Z E X E Y and Var Z Var X Var Y It is possible to show that this is generally true: That is that E X Y E X E Y and Var X Y Var X Var Y but in fact these are specific examples of the more general identities: E aX bY aE X bE Y and Var aX bY a 2Var X b 2Var Y (Note: These identities are in the formula book) We can then use these identities to solve problems involving a combination of distributions Frequently in such problems we also use the idea that a linear combination of Normal distributions is also Normal, but this will not be proven. € Tony Halsey, 2007. Unregistered Copy Page 14 Example question 1 The weights of apples are normally distributed with mean 380g and standard deviation 45g and the weights of bananas are normally distributed with mean 240g and standard deviation 25g. A bag contains three apples and 5 bananas. What is the probability that the bag weighs more than 2.5kg? Solution If we let the total weight of the bag be W, then W A1 A2 A3 B1 B2 B3 B4 B5 2 2 where Ak ~ N 380, 45 and Bk ~ N 240,25 E W E A1 A2 A3 B1 B2 B3 B4 B5 E A1 E A2 E A3 E B1 E B2 E B3 E B4 E B5 3 380 5 240 2340 Also Var W Var A1 A2 A3 B1 B2 B3 B4 B5 Var A1 Var A2 ... Var B4 Var B5 3 452 5 252 9200 So W ~ N 2340, 9200 Thus we can calculate p W 2500 0.0476 (From TI calculator € Tony Halsey, 2007. Unregistered Copy ) Page 15 Example question 2 The weights of apples are normally distributed with mean 380g and standard deviation 45g and the weights of bananas are normally distributed with mean 240g and standard deviation 25g. Apples cost 3 pesos per gram and bananas cost 5 pesos per gram If I buy an apple and a banana at random, what is the probability I spend more than 2500 pesos? Solution If we let the total weight of the bag be C, then 2 2 where A ~ N 380, 45 and B ~ N 240, 25 C 3A 5B E C E 3A 5B 3E A 5E B 3 380 5 240 2340 Also Var C Var 3A 5B 9Var A 25Var B 9 452 25 252 33850 So C ~ N 2340, 33850 Thus we can calculate p C 2500 0.192 (From TI calculator ) We can note the difference between the last two examples. In example 1, there were DIFFERENT apples and bananas, each with it’s own distribution, and in example 2, there was just ONE apple and ONE banana and the two distributions were multipled by a value. € Tony Halsey, 2007. Unregistered Copy Page 16 Example question 3 The weights of apples are normally distributed with mean 380g and standard deviation 45g and the weights of bananas are normally distributed with mean 240g and standard deviation 25g. If I buy an apple and a banana at random, what is the probability that the apple is more than twice as heavy as the banana? Solution If we create a new distribution D A 2B 2 2 where A ~ N 380, 45 and B ~ N 240, 25 E D E A 2B E A 2E B 380 2 240 100 Also Var D Var A 2B Var A 4Var B 452 4 252 4525 So D ~ N 100, 4525 Thus we can calculate p D 0 0.0686 (From TI calculator € Tony Halsey, 2007. Unregistered Copy ) Page 17 Exercise 2 1. The weights of exercise books are normally distributed with mean 240g and standard deviation 75g. The weights of text books are normally distributed with mean 680g and standard deviation 110g. A student’s bag contains three exercise books and one text book. What is the probability that total weight of books in the student’s bag is less than 1.5kg. 2. The weights of cars are normally distributed with mean 1.2 tonnes and standard deviation 0.45 tonnes. The weights of trucks are normally distributed with mean 6.2 tonnes and standard deviation 0.88 tonnes. A ferry has bookings for 24 cars and 5 trucks. (a) What is the probability that the total weight for the ferry is over 60 tonnes? (b) What is the probability that the total weight of cars is greater than the total weight of trucks? 3. Lengths of wood are normally distributed with mean length 5m and standard deviation 0.15m. Two lengths of 2m are to be cut from the original and, due to inaccuracies of the cutting process their lengths each are normally distributed with mean 2m and standard deviation 0.05m. Find the probability that the remaining piece of wood is over 1.1m. 4. The weights of bags of apples are normally distributed with mean 2kg and standard deviation 0.35kg, and the weights of potatoes are normally distributed with mean 5kg and standard deviation 0.86kg. If I buy a bag of apples, priced at 4.8 Swiss Francs per kg, and a bag of potatoes, priced at 2.3 Swiss Francs per kg, what is the probability that I spend more than 20 Swiss Francs? 5. The duration of a guest speaker’s speech is known from experience to be Normally distributed with mean 9 minutes and standard deviation 4.2 minutes, and the duration of the headmaster’s speech is known to be Normally distributed with mean 5 minutes and standard deviation 1.7 minutes. At a Christmas concert, two guest speakers and the headmaster are to speak. What is the probability that the total length of the three speeches is greater than 30 minutes? What is the probability that the two guest speakers (together) talk for more than three times the length of time that the headmaster speaks? 6. Buttercups are randomly distributed in a field so that on average there are 8 buttercups per square metre of field. Daisies are randomly distributed in the same field so that on average there are 15 daisies per square metre. What is the probability that there are more than 25 flowers in a given square metre of field?(you may assume there are no other types of flowers and that a linear combination of two Poisson distributions gives another Poisson distribution) € Tony Halsey, 2007. Unregistered Copy Page 18 Chapter 3. The Central Limit Theorem. When making inferences about a POPULATION we usually have to do so from the data provided by a SAMPLE. For example if I want to estimate the number of children per family in Switzerland, I may take a sample of 50 families to make my estimate. Suppose the mean of my sample is 2.6 children per family I can ESTIMATE the mean of the population (all families in Switzerland) to be 2.6 children per family If someone else did the same thing, they would very likely find a different estimate for the mean number of children per family If many people did the same thing, we would have many different sample means. The CENTRAL LIMIT THEOREM states that the distribution of sample means of size n is 2 or variance approximately Normal with mean and standard deviation , where n n and are the mean and standard deviation of the population. So 2 X ~ N , n The Central Limit Theorem is exact if the sample is from a Normally distributed population, and if it not a Normally distributed population, the Central Limit Theorem still applies as an approximation – the larger the value of n, the better the approximation. We can see that as n increases, so the standard deviation decreases. That means that the larger the sample, the more likely that the mean of the sample will be close to the mean of the population. € Tony Halsey, 2007. Unregistered Copy Page 19 Example question On a certain beach, turtles go each year to lay their eggs. The number of eggs in a nest is known to be normally distributed with mean 114.78 and standard deviation 22.4 (a) Calculate the probability that the mean number of eggs in a sample of 10 nests is greater than 120 (b) If a sample of 5 nests is taken, calculate the value of a, such that the probability of the mean of the sample is within a of the mean is 90% Solution (a) 22.42 X ~ N 114.78, 10 p X 120 0.231 (b) (From TI calculator ) 22.42 X ~ N 114.78, 5 p 114.78 a X 114.78 a 0.9 so p X 114.78 a 0.95 a p Z 0.95 22.4 / 5 a 1.64 22.4 / 5 a 16.6 (From TI calculator € Tony Halsey, 2007. Unregistered Copy ) Page 20 Exercise 3 1. The heights of trees in a forest are thought to be normally distributed with mean 5.2m and standard deviation 1.3m. (a) Find the probability that the mean height of a sample of 20 trees is greater than 6m (b) Find the probability that the mean height of a sample of 100 trees is less than 5m (c) Find the probability that the mean height of a sample of 50 trees is between 5m and 5.5m (d) Find the value of a, for which the probability that the mean height of a sample of 10 trees is within a of 5.2m, is 0.9 2. The mean percentage for students, in an end of year exam, is known to be 67.5, with a standard deviation of 18.2. (a) What is the probability that the mean percentage of a random sample of 30 students is less than 60%? (b) What is the probability that the mean percentage of a random sample of 50 students is more than 70%? (c) Explain why it would not be reasonable to calculate the probability that the mean percentage of a sample of 5 students is between 60% and 70% € Tony Halsey, 2007. Unregistered Copy Page 21 Chapter 4. The Proportion of Success in a large sample and Significance Testing Example question 1 An egg manufacturer claims that eggs delivered to a supermarket have no more than 5% broken. On a certain day, of 1000 eggs delivered to the supermarket, 80 are found to be broken. What is the probability of this happening, given the manufacturers claim is correct. Comment on your answer. Solution METHOD 1 - EXACT X are the number of broken eggs in the delivery X ~ B 1000, 0.05 (according to the manufacturer’s claim) p X 80 0.0000349 (From TI calculator ) It seems unlikely that the manufacturer’s claim is correct. There are other approximate methods for analysing this situation using the central limit theorem € Tony Halsey, 2007. Unregistered Copy Page 22 Solution METHOD 2 - USING THE CLT X are the number of broken eggs in the delivery X ~ B 1000, 0.05 X Now if p then n so E X 50 and X E p E n1 E X n1 np p 0.05 n X and Var p Var n So Var X 47.5 1 n2 Var X p ~ N p, pq or in this case n 1 n2 npq pq 0.0000475 n p ~ N 0.05, 0.0000475 (Note that this is an approximation, assuming n is large) Now p p 0.08 0.00000672 (From TI calculator ) It seems unlikely that the manufacturer’s claim is correct. Note that whilst the conclusion may be the same, the probability is quite different. This is as a result of the approximation. Solution METHOD 3 - USING THE 1-PropZTest on the TI calculator So p p 0.08 0.00000672 and it seems unlikely that the manufacturer’s claim is correct. € Tony Halsey, 2007. Unregistered Copy Page 23 Example question 2 A drug company claims that a new drug cures 75 % of patients suffering from a certain disease. However, a medical committee believes that less than 75 % are cured. To test the drug company’s claim, a trial is carried out in which 100 patients suffering from the disease are given the new drug. It is found that 68 of these patients are cured. Does the medical committee have the evidence it requires to refute the drug company’s claim? Solution METHOD 1 - EXACT X is the number of patients cured If 75% of patients are cured then X ~ B 100, 0.75 (From TI calculator p X 68 0.0693 ) METHOD 2 - Approximation with CLT pq p ~ N p, n p ~ N 0.75, 0.001875 (assuming 75% of patients are cured) or p p 0.68 0.0530 At this stage we are a little uncertain of whether such results are sufficient evidence or not. Exactly what constitutes “sufficient evidence”? How sure do we need to be? Would we have enough evidence for a lawsuit? Clearly we need to develop a more formal approach to testing a hypothesis. First we write the hypothesis in a formal way. We say that the NULL HYPOTHESIS, or H0, is that the company’s claim is correct – i.e. that 75% of patients are cured. Also the ALTERNATIVE HYPOTHESIS, or H1, is that LESS than 75% are cured Note that we are not interested in this case if MORE than 75% are cured This is called a 1-TAIL TEST A 2-TAILED TEST would be where OTHER THAN 75% are cured € Tony Halsey, 2007. Unregistered Copy Page 24 We write: H0: 75% of patients cured by the drug H1: Less than 75% of patients cured by the drug OR H0: p=0.75 H1: p<0.75 Next we decide on a SIGNIFICANCE LEVEL If we need to be very sure (e.g. to take out a lawsuit) then we need a low significance level. A higher level if it is not so important Note that you will usually be given the significance level. Usually it is 10%, 5%, or 1% Now we can use the calculated probability p X 68 0.0693 So if we test at the 10% level (probably not sure enough for a lawsuit) Then as 6.93% < 10%, we would reject H0 in favour of H1. The company’s claim does not seem reasonable. On the other hand if we test at the 1% level Then as 6.93% > 1%, we say there is insufficient evidence to reject H0 and we accept the company’s claim that 75% of patients are cured by the drug. Example question 3 The proportion of defective items produced by a machine is thought to be 0.05. In a random sample of 200 items, 15 were found to be faulty. Test at the 5% level of significance to see if the machine producing more defective items than thought? Solution H0: The machine is producing 5% defective items H1: The machine is producing more than 5% defective items From TI calculator p p 0.075 0.0523 ( ) 5.23%>5% so there is insufficient evidence to reject H0. We accept the claim that the proportion of defective items produced by a machine is 0.05 € Tony Halsey, 2007. Unregistered Copy Page 25 In the previous examples the standard deviation was known, as the distribution of number of patients etc. was Binomial. In other cases, the standard deviation may be known (by other sources) or estimated from the sample. These cases give rise to different types of hypothesis testing Example question 4 A sample of 10 crates of items is taken from a factory which states that the mean number of defective items per crate is 12. The sample mean of the ten crates is found to be 13. Knowing that the standard deviation of the number of defective items is 1.6, test the hypothesis that the mean is more than 12: (a) at the 10% level of significance. (b) at the 1% level of significance. Solution H0: The mean number of defective crates is 12. H1: The machine is producing more than 5% defective items. From the CLT, 1.62 X ~ N 12, 10 p X 13 0.0241 (From TI calculator ) Or from TI calculator so p X 13 0.0241 (a) 2.41%<10% so we reject H0 in favour of H1. We think that the mean number of defective items is more than 12. (b) 2.41%>1% so there is insufficient evidence to reject H0. We accept that the mean number of defective items per crate is 12. € Tony Halsey, 2007. Unregistered Copy Page 26 Now if the standard deviation of the population is NOT known, and has to be estimated from the sample, then the distribution is altered. Rather than being Normal, the distribution follows that of a Student-t distribution. The standardised Student-t distribution has parameter , known as the degrees of freedom The degrees of freedom is generally (for the purposes of the course) equal to n - 1 Example question 5 A sample of 10 crates of items is taken from a factory which states that the mean number of defective items per crate is 12. The sample mean of the ten crates is found to be 13 and the estimate of the standard deviation of the number of defective items is 1.6, test the hypothesis that the mean is more than 12: (a) at the 10% level of significance (b) at the 1% level of significance Solution H0: 12 and H1: 12 The standardised test statistic is T X ~ t 9 Sn 1 10 Note there are 9 degrees of freedom as n = 10 so 13 12 p X 13 p 1.6 10 10 p T 0.0398 1.6 (From TI calculator ) Or from TI calculator so p X 13 0.0398 (a) 3.98%<10% so we reject H0 in favour of H1. We think that the mean number of defective items is more than 12. (b) 3.98%>1% so there is insufficient evidence to reject H0. We accept that the mean number of defective items per crate is 12. € Tony Halsey, 2007. Unregistered Copy Page 27 Exercise 4A 1. A drug company claims that a new drug cures 80% of patients suffering from a certain disease. To test the drug company’s claim, a trial is carried out in which 50 patients suffering from the disease are given the new drug. It is found that 32 of these patients are cured. (a) State suitable hypotheses. (b) Find the p-value of your test. (c) State the conclusions reached if the test is conducted at the 10% significance level and justify your answer. 2. The proportion of defective items produced by a machine is thought to be 0.05. In a random sample of 200 items, 15 were found to be faulty. Is the machine producing more defective items than thought? Test your hypotheses at the 5% level of significance. 3. It is thought by the government that 60% of the population will vote “yes” in an upcoming referendum. A test sample of 100 people is taken and 48 state that they will vote “yes”. Should the government modify its thinking? Test your hypotheses at the 10% level of significance. 4. It is known that the standard deviation of ages of people entering a club is 5.6 years. The club claims that the mean age of clients is 22. A sample of 8 clients is taken and their mean age is found to be 19.5. Is there sufficient evidence to suggest that the mean age of clients is less than that stated? Test your hypothesis at the 5% significance level. 5. A post office claims that the mean waiting time is 3.2 minutes. Three people entering the post office at random wait 4.1 minutes, 3.5 minutes and 3.7 minutes. Test at the 10% level the post office’s claim, knowing that the standard deviation of waiting times is 1.4 minutes. 6. A club claims that the mean age of clients is 22. A sample of 8 clients is taken and their mean age is found to be 19.5 and the estimate of the standard deviation of ages of people entering the club is 5.6 years. Is there sufficient evidence to suggest that the mean age of clients is less than that stated? Test your hypothesis at the 5% significance level. 7. A post office claims that the mean waiting time is 3.2 minutes. Three people entering the post office at random wait 4.1 minutes, 3.5 minutes and 3.7 minutes. Test at the 10% level the post office’s claim. 8. In 200 tosses of a coin, 108 tails and 92 heads were observed. Conduct a 2-Tailed test, at the 1% significance level, to test if it is a fair coin, 9. A snack food company uses a machine to package pretzels in packets labelled 454g. It is known from experience that the standard deviation of the weights of packets is 8.4g. A sample of 50 packets was taken to see if the machine was working properly and the sample mean weight was found to be 451.22. Conduct an appropriate 2-tailed test, at the 10% significance level, to see if the machine was working properly. € Tony Halsey, 2007. Unregistered Copy Page 28 Further Notes and terminology As the notation for the probabilities can become complicated (especially for 2-tailed tests), we often simply write p value 0.0398 instead of p X 13 0.0398 or p p 0.13 0.0398 The standard deviation of the distribution of sample means i.e. is often referred to as n the STANDARD ERROR Incorrectly rejecting the null hypothesis is known as Type I error. Incorrectly accepting the null hypothesis is known as Type II error. In all cases, there is a value of the STANDARDISED test statistic at which we cease to accept the null hypothesis and start to reject it. This point is known as the CRITICAL VALUE. This value can be found using the tables provided, or using the calculator. All the values of the standardised test statistic whereby the null hypothesis is rejected is known as the CRITICAL REGION. For Example In Example question 5 the standardised test statistic was The calculated value was T X ~ t 9 Sn 1 10 10 1.976 1.6 The critical value is the value for which we would reject the null hypothesis, that is p T t 0.1 for part (a) or p T t 0.01 for part (b) Using SOLVER on the TI calculator we get the values of t to be 1.383 and 2.821 respectively or we can read these values from the tables of critical values of the Student-t distribution The critical region for part (a) is all values greater than 1.383, so since 1.976 is in that critical region, we would reject H0. € Tony Halsey, 2007. Unregistered Copy Page 29 Example question 6 Letters passing through a sorting office are thought to have a mean weight of 45g. A sample of 20 letters from the sorting office is used to test the hypothesis that the mean weight of letter is greater than 45g, and it is concluded that, at the 5% level of significance, the mean weight is in fact greater than 45g. (a) Given that the estimate of the population variance is 27, find the minimum mean weight of the sample. (b) If the mean weight of the sample was in fact, 47.8g, what is the probability of making a type I error? Solution (a) H0: = 45 H1: > 45 Critical value = 1.729 1.729 (from tables or SOLVER) X 45 27 20 So X 47.01 The minimum weight of the sample must have been 47.01g (b) From TI calculator p X 47.8 0.0131 So the probability of a type I error is 0.0131 € Tony Halsey, 2007. Unregistered Copy Page 30 Example question 7 It is believed that 40% of men drink more than 20 units of alcohol per week. A sample of 20 men was taken and 11 of them were found to drink more than 20 units of alcohol per week. (a) Test, at the 1% level of significance, the hypothesis that more than 40% of men drink more than 20 units of alcohol per week (b) It is subsequently found that 48% of men drink more than 20 units of alcohol per week. With the hypotheses in part (a), what was the probability of making a type II error taking a sample of 20 men? Solution (a) H0: p 0.4 If H1: p 0.4 pq p ~ N p, N 0.4, 0.012 n p - value = 0.0855 (from TI calculator ) 8.55% > 1% so there is insufficient evidence to reject H0. We accept that 40% of men drink more than 20 units of alcohol per week. (b) We accept H0 if p p x 0.01 so x 0.655 ( ) We accept H0 if no more than 13 in the sample drink more than 20 units. Now if the number of men in a sample of 20 who drink more than 20 units of alcohol is X then X ~ B 20, 0.48 Hence the probability of a type II error is p X 13 0.960 ( € Tony Halsey, 2007. Unregistered Copy ) Page 31 Exercise 4B 1. The standard deviation of weights of a certain breed of bird’s eggs is known to be 12g. A sample of 25 bird’s eggs is taken to ascertain if the mean weight has fallen from it’s previous value of 198g. (a) If it is concluded that the weights have in fact dropped, according to a test at the 5% significance level, what was the maximum possible mean weight of the sample? (b) If, in fact the mean of the sample was 192.8g, what was the probability of making a Type I error? 2. A student wishes to find out if his internet connection is satisfactory for downloading music. In 250 downloaded tracks, 116 downloads were broken at some point. (a) Test the claim, at the 5% level of significance, that the proportion broken downloads is less than 50%. (b) Given that, in fact, two-fifths of all downloads are broken, calculate the probability of a Type II error. (c) Explain what is meant by a Type II error in this context. 3. A random sample of 8 of a certain type of snake is taken to test the hypothesis that the mean length is greater than the previously believed value of 135cm. The mean length of snake in the sample was found to be 142cm resulting in the conclusion that the mean length of snake is in fact 135cm if testing at the 10% level of significance. (a) Calculate the minimum possible standard deviation of the lengths of this type of snake if it is known not to have changed from a previous study (b) Calculate the minimum possible standard deviation of the lengths of this type of snake if it has been estimated from the sample. 4. It is suggested that the mean number of offspring in a litter of a certain type of mouse is 18.3. (a) A sample of three mice is taken and they had litters of 12, 17 and 20. Test at the 10% level of significance to see if the size of litter is in fact less than 18.3 (b) It is later found, in a more comprehensive study, that the mean size of litter is in fact 16.8 What type of error has been made in part (a) 5. A survey is conducted to find out if more than three quarters of householders own a dishwasher and of 50 householders surveyed, 38 own a dishwasher. If the test was conducted at the 5% level of significance and it is known that 85% of householders own a washing machine, what is the probability that a Type II error was made?. € Tony Halsey, 2007. Unregistered Copy Page 32 Chapter 5. Confidence Intervals A confidence interval is an interval in which we are certain to a given probability of the mean lying in that interval. For example if a 95% confidence interval is given, then there would be a 95% probability that the mean is in that interval. Example question 1 On a given busy day, 1000 eggs are delivered to a supermarket and 80 are found to be broken. Find the 95% confidence interval for the mean percentage of broken eggs. Solution METHOD 1 p ~ N p, pq n As before We can estimate p as 0.08 so p ~ N 0.08, 0.0000736 Now if p 0.08 a p 0.08 a 0.95 then p p 0.08 a 0.975 so a 0.0168 And the confidence interval is [0.0632, 0.0968] METHOD 2 From the formula book Now for a 95% confidence interval, we need to take z 1 0.975 1.96 So the confidence interval is 0.08 1.96 0.08 0.92 , 0.08 1.96 0.08 0.92 0.0632, 0.0968 1000 1000 On the TI calculator METHOD 3 Using TI calculator € Tony Halsey, 2007. Unregistered Copy Page 33 Example 2 A sample of 10 crates of items is taken from a factory and the mean of the ten crates is found to be 13. Knowing that the standard deviation of the number of defective items is 1.6, find a 95% confidence interval for the mean number of defective items in a crate. Solution METHOD 1 1.62 X ~ N 13, 10 p 13 a X 13 a 0.95 p X 13 a 0.975 a 0.99167 From TI calculator Hence the confidence interval is [12.01, 13.99] METHOD 2 From the formula book For a 95% confidence interval, we need to take z 1.96 So the confidence interval is 1.6 1.6 ,13 1.96 13 1.96 12.01,13.99 10 10 METHOD 3 From TI calculator € Tony Halsey, 2007. Unregistered Copy Page 34 Again if the standard deviation of the population is NOT known, and has to be estimated from the sample, then the distribution is altered to that of a Student-t distribution. Example question 3 A sample of 10 crates of items is taken from a factory and the sample mean of the ten crates is found to be 13 and the estimate of the standard deviation of the number of defective items is 1.6. Find a 90% confidence interval for the mean number of defective items. Solution METHOD 1 T X 13 ~ t 9 and 1.6 10 p X 13 a 0.95 a 0.927 p 13 a X 13 a 0.90 a 10 0.95 p T 1.6 From TI calculator Or read from the critical value of the Student-t table with 9 degrees of freedom to be 1.833. Hence a 10 1.833 a 0.927 1.6 Hence the confidence interval is [12.1, 13.9] € Tony Halsey, 2007. Unregistered Copy Page 35 METHOD 2 From the formula book For a 90% confidence interval, we need to take t 1.833 (from tables or from SOLVER ) So the confidence interval is 1.6 1.6 ,13 1.833 13 1.833 12.1,13.9 10 10 METHOD 3 From TI calculator € Tony Halsey, 2007. Unregistered Copy Page 36 Exercise 5A I. To test a drug company’s claim, a trial is carried out in which 100 patients suffering from a disease are given a new drug. It is found that 68 of these patients are cured. Find a 90% confidence interval for the proportion of patients cured by the drug. 1. Of ten class 2 students, chosen at random, 5 are thinking of taking higher level maths. Calculate a 95% confidence interval for the percentage of class 2 students thinking of taking higher level maths. Comment on your answer. 2. In order to estimate the age of teacher’s cars, a sample of 25 teachers was taken and the ages of their cars was given in the table below: Age(years) 0-1 1-2 2-3 3-4 4-5 5-6 6-8 8-10 10-20 Frequency 5 8 4 2 2 1 1 1 1 Estimate a 90% confidence interval for the mean age of teacher’s cars. 3. 50 telephone calls from an office were monitored and the length of calls measured. The mean length of call of those 50, was 4.8 minutes. A more comprehensive survey done earlier found the standard deviation to be 1.8 minutes, and it is assumed not to have changed. Find a 90% confidence interval for the mean length of call from the office. 4. A sample of 72 of a certain type of beetle was taken and the sum of their lengths was found to be 1227mm. If the sum of the squares of their lengths was found as 21023, find a 99% confidence interval for the mean length of beetle. € Tony Halsey, 2007. Unregistered Copy Page 37 Exercise 5B 1. A student measures the heights of 10 trees to obtain an estimate for the mean height of trees in a forest. He correctly calculates the 95% confidence interval to be [2.56, 4.12] using a Student-t distribution. What are his estimates for the mean and standard deviation of the heights of trees in the forest? 2. A statistician knows that the standard deviation of the number of particles in a gas chamber is 145.2. On analysing 20 gas chambers he calculates a confidence interval for the mean number of gas particles in a chamber to be [842, 988]. What is the degree of confidence of his interval? 3. A statistician states that the proportion of votes for the Liberated Party in the next election will be between 22% and 34%. If he made that prediction on the basis of a sample of 50 voters, with what degree of confidence does he make his prediction? 4. A sample of 20 crates is taken to calculate the 90% confidence interval of the mean weight of a crate when the standard deviation is unknown. Find Sn-1, the estimate of the population standard deviation, if the confidence interval is found to be [8742, 9122]. 5. A sample of 60 is used to calculate a 90% confidence interval of a population with known variance. If the confidence interval calculated is [78.6, 84.8], calculate the value of the population variance. 6. A student correctly calculates, from a sample of 10 houses, the 99% confidence interval for the mean height of buildings in the Gstaad area to be [8.84, 10.22]. If the standard deviation of heights of buildings was given as known to be : (a) calculate the student’s estimate of , the mean height of buildings in Gstaad. (b) calculate the value of used. € Tony Halsey, 2007. Unregistered Copy Page 38 Chapter 6. The chi-squared goodness of fit test. Example question 1 A dice is thrown 60 times and the following frequencies are obtained Score | 1 2 3 4 5 6 Frequency | 12 13 10 11 11 3 Is there evidence to suggest that the dice is biased? How can we calculate the probability of results as extreme as these in order to test a hypothesis? We use the fact that the quantity Observed Frequency Expected frequency 2 Expected Frequency approximately follows a 2 probability distribution Note that if any of the expected frequencies are less than 5, then the chi-squared distribution may not be a reasonable approximation. Solution H0: the dice is fair – i.e. X~UD(6) where X is the score H1: the dice is biased – i.e. the distribution is not uniform We can then make a table of observed and expected frequencies So Observed frequency Expected Frequency 1 12 10 2 13 10 3 10 10 4 11 10 5 11 10 6 3 10 2 calc fo f e fe 2 6.4 The degrees of freedom are 5 as there are six classes (just as with the Student-t distribution) We say = 5, so testing at the 5% significance level, 2 p calc 6.4 0.269 27%>5% so there is insufficient evidence to reject H0. We accept that the dice is not biased. € Tony Halsey, 2007. Unregistered Copy Page 39 On a TI84 with a recent operating system (it can be updated through TI connect), you can use the chi-squared-GOF test ...but not on a TI83! More on Degrees of freedom The number of degrees of freedom is not always n - 1. In fact the degrees of freedom is found by: = no of classes – no of restrictions – no of estimated parameters Often = n – 1 as there is a restriction on the total frequency (as in the previous example) However, frequently we also have estimated parameters. For example if our hypothesis is that data fits an exponential distibution and we estimate the parameter from the data, then = n – 1 1 If our hypothesis is that data fits a normal distibution and we estimate the mean and standard deviation from the data, then = n – 1 2 On the other hand if our hypothesis is that data fits a normal distibution with a certain mean but we estimate the standard deviation from the data, then = n – 1 1 € Tony Halsey, 2007. Unregistered Copy Page 40 Example question 2 Data is collected as below: x 0 1 2 3 4 frequency 10 16 12 7 3 Test at the 5% significance level to see if the data is from a Poisson distribution.. 5 2 Solution H0: X~Po() H1: X is not Poisson Estimate from the data to be 1.66 (from TI calculator) =6–1–1=4 We calculate the expected frequencies to be 9.51, 15.78, 13.10, 7.25, 3.01, 1.36 Note that to find the last expected frequency we need to find the probability of 5. Now we notice that the last two expected frequencies are less than 5. Furthermore grouping the two will still not give an expected frequency of at least 5 so the last three classes will need to be grouped. Regrouping the table we obtain x 0 1 2 3 fo 10 16 12 12 fe 9.51 15.78 13.1 11.61 So we change = 4 – 1 – 1 = 2 2 calc 0.134 2 p calc 0.134 0.935 As 94%>5% there is insufficient evidence to reject H0. We accept that the data is distributed by a Poisson distribution. € Tony Halsey, 2007. Unregistered Copy Page 41 Exercise 6 1. Five coins were tossed 1024 times and the number of heads, x, was observed. The frequencies for x are shown in the table below. x 0 1 2 3 4 5 frequency 42 148 346 291 168 29 Test the hypothesis that the coins were all fair at the 5% level of significance 2. An urn contains a very large number of marbles of four different colours: red, orange, yellow and green. A sample of 120 marbles from the urn revealed 22 red, 49 orange, 38 yellow and 13 green marbles. It is suggested that the urn contains equal numbers of each coloured marble. Test the hypothesis at the 10% level of significance. 3. The following table shows the number of days during a 120 day period on which x accidents occurred. x 0 1 2 3 4 Number of days 56 37 21 3 3 Test at the 5% level of significance to see if the number of accidents follows a Poisson distribution with parameter 0.9. 4. The ability of adults to estimate the size of an angle was part of the research into the use of mathematics. People were shown an angle of 72… and asked to estimate it’s size. The results were: Angle (degrees) Frequency 56-60 61-65 66-70 71-75 76-80 81-85 86-90 11 23 58 51 32 17 1 It is claimed that the estimates are normally distributed with mean 72…. Test this claim at the 1% level of significance. 5. It is suggested that there are 32 juvenile voles and 5 adult voles in a colony. Each evening over a ten week period, five voles are caught and the number of adult voles in the sample is noted. Number of adult voles 0 1 2 3 4 5 Number of samples 14 23 22 6 4 1 Test at the 1% significance level that the original suggestion is correct. 6. A coin is tossed until the third head is obtained, 40 times. The number of throws needed is recorded below: Number of throws 3 4 5 6 7 8 9 10 Frequency 5 7 8 6 5 4 3 2 Test at the 5% level of significance if the coin is fair. € Tony Halsey, 2007. Unregistered Copy Page 42 Chapter 7. The chi-squared test for independence Example question 1 100 Tennis players are studies to see if their style is dependent on their nationality The results were as follows Eastern Europe Western Europe Americas Serve/Volley 8 15 5 Baseline 11 38 23 Is there evidence to suggest that tennis style is dependent on nationality? To answer this question, again we use the fact that the quantity Observed Frequency Expected frequency Expected Frequency 2 approximately follows a 2 probability distribution Solution METHOD 1 First we need to calculate the expected frequencies. To do this we assume the totals must remain the same. We recognise that 19% of the total tennis players are from Eastern Europe and that 28 players have a serve/volley game. Thus if the style of game is independent of nationality, we would expect 19% of the 28 serve/volley players to be from Eastern Europe. Following the same procedure we obtain the following table of expected values. Eastern Europe Western Europe Serve/Volley 28*19/100=5.32 28*53/100=14.84 28*28/100=7.84 28 Baseline 72*19/100=13.68 72*53/100=38.16 72*28/100=20.16 72 19 53 28 H0: Style is independent of nationality, Americas Total H1: Style depends on nationality = (row – 1)(col – 1) = 2 2 calc 3.306 2 p calc 3.306 0.191 Testing at the 10% significance level, 19%>10% so there is insufficient evidence to reject H0. We accept that tennis style is independent of nationality. € Tony Halsey, 2007. Unregistered Copy Page 43 METHOD 2 - Calculator Put in the observed frequencies into matrix A H0: Style is independent of nationality H1: Style depends on nationality 2 calc 3.306 (with 2 degrees of freedom) 2 p calc 3.306 0.191 (and expected frequencies are in matrix B if required ) Testing at the 10% significance level, 19%>10% so there is insufficient evidence to reject H0. We accept that tennis style is independent of nationality. € Tony Halsey, 2007. Unregistered Copy Page 44 Exercise 7 1. A horse breeder has gathered the following data about the weight of mares and the weight of their foals Mare Foal Heavy Light Heavy 36 27 Light 22 35 (a) State the hypotheses to test if the foals weight is dependent on the weight of the mother. (b) Calculate the value of the test statistic for the 2 test (c) State the critical value if we test at the 5% level of significance (d) Conduct the test, and clearly state your conclusions, with justification. 2. The data produced below shows the results of a surgical procedure designed to improve the functioning of certain joints which have become impaired by disease. For each of five hospitals, counts are given of the numbers of patients which fall into different post operative categories. Hospital A B C D E No improvement 13 5 8 21 43 Partial functional restoration 18 10 36 56 29 Complete functional restoration 16 16 35 51 10 Total 47 31 79 128 82 In order to conclude if different hospitals have differing effects it is proposed to conduct a chi-squared test for significance at the 1% level. (a) State the null and alternate hypotheses (b) The following is a table for the expected values. Calculate the values of a, b and c Hospital No improvement Partial functional restoration Complete functional restoration A B C D E 27.7 16.1 10.1 16.4 c a 32.3 45.6 43.8 35.4 34.0 b 44.3 39.8 12.2 (c) Show that the number of degrees of freedom is 8. (d) Write down the critical value of chi-squared at the 1% level of significance. (e) The calculated value of chi-squared is 56.7. Do you accept H0? Explain your answer. € Tony Halsey, 2007. Unregistered Copy Page 45 3. The following table gives the results of a survey into the amount of time spent studying maths and the final IB Grade of a group of students. Final Grade 7 6 5 4 3 2 1 More than 8 hours per week 1 3 1 2 0 0 0 Between 5 and 8 hours per week 4 10 15 16 4 2 1 Less than 5 hours per week 4 5 8 12 7 3 2 Test the hypothesis, at the 1% level of significance, that the IB grade depends on the amount of time studying. 4. 50 students were asked the percentage of their pocket money that was spent on clothes. The results are illustrated in the contingency table below: Percentage of pocket >80% 60%-80% 40%-60% 20%-40% <20% Male 6 6 5 5 3 Female 9 9 6 1 0 money spent on clothes Test the hypothesis, at the 5% level of significance, that the proportion of pocket money spent on clothes is independent of gender. 5. The number of students obtaining different IB grades in two different schools were noted below: Grade 7 6 5 4 3 2 1 School A 3 11 17 16 8 2 2 School B 1 8 12 18 4 1 0 Test at the 10% level of significance if the grades are independent of the school. € Tony Halsey, 2007. Unregistered Copy Page 46 Practice Paper 1 1. A poll suggests that 85% of people favour a subsidy of public transport. The government disputes this claim and conducts it’s own survey, finding that 14 of the 20 people surveyed favoured a subsidy of public transport. Conduct a 2-tailed test, at the 5% level of significance to see if the government is justified in it’s claim. If the government claims that less than 85% favour the subsidy, explain the difference to the test conducted, and state the conclusions reached. [8 marks] 2. A mathematical studies project set out to see if there was any difference to the way boys and girls chose their sporting activities. Given a choice of tennis, basketball, and swimming as the sporting choices, 11 girls and 6 boys chose tennis, 3 girls and 12 boys chose basketball, and 11 girls and 9 boys chose swimming. (a) Assuming that boys and girls are equally likely to choose any of the sports, calculate the expected number of girls choosing basketball. A chi-squared test for independence was conducted at the 10% level of significance. (b) State the critical value for the test. (c) Conduct the test, and state the conclusions reached. [10 marks] 3. Beer is sold in bottles labelled 330cm3 or cans labelled 500cm3. It is found that the quantity of beer in the bottles is normally distributed with mean 332cm3 and standard deviation 22cm3, and in the cans with mean 507cm3 and standard deviation 31cm3. (a) In a fridge there are three bottles of beer and two cans. What is the probability that there is more than two litres of beer in the fridge? (b) What is the probability that the three bottles contain more beer than the two cans? [12 marks] 4. Data is collected on the number of fish caught in competitions throughout the year. Number of fish caught 0 1 2 3 4 5 6 7 8 9 Frequency 9 22 39 46 35 28 14 3 3 1 Test the hypothesis that the data follows a Poisson distribution, at the 5% level of significance. [13 marks] 5. Breakfast cereal is packaged in boxes labelled 500g. A random sample of 5 boxes is taken, and their content weights are 487g, 492g, 502g, 479g, 493g. (a) Find a 90% confidence interval for the mean weight of cereal. (b) The manufacurers claim that the mean weight of cereal is 500g. (i) Testing the hypothesis, at the 5% level of significance, that the mean weight is less than 500g, explain how the critical value relates to the answer found in (a) (ii) Complete the test, and write down the conclusion reached. (c) If the standard deviation is known to be , and a 95% confidence interval is correctly calculated to be [482.01, 499.49], find the value of . € Tony Halsey, 2007. Unregistered Copy [17 marks] Page 47 Practice Paper 2 1. Batches of 1000 blank CD’s are sampled from four manufacturers and the numbers of faulty disks are recorded Below. Manufacturer Brand A Brand B Brand C Brand D 8 17 6 7 Number of faulty disks (a) Assuming the number of faulty disks is independent of the manufacturer, calculate the expected frequencies of faulty disks for each brand and also the expected frequencies of non-faulty disks. (b) Conduct a 2 test at the 5% level of significance to decide if the number of faulty disks is independent of the manufacturer. [9 marks] 2. In a study of bees, the times between visits to a particular flower is shown in the table below: Time between visits 0-1 1-2 2-3 3-5 5-10 10-20 Frequency 20 15 10 14 21 20 At the 10% level of significance, test to see if the time between visits follows an exponential distribution with parameter 0.2 [12 marks] 3. In a reservoir during the winter months, the amount the water rises per week due to rainfall is known to be normally distributed with mean 4.1cm and standard deviation 0.8cm. In addition, the amount the water drops in a week due to usage and evaporation is normally distributed with mean 3.8cm and standard deviation 1.6cm. (a) What is the probability that in a week, the water level in the reservoir rises overall. (b) What is the probability that in a 4 week period, the water level will drop overall by more than 5cm. [12 marks] 4. Previous records show that in a particular ski resort, the proportion of clear days in the season is 73%. This year 53 days out of the season of 84 days had clear skies. A student tests, at the 1% level of significance, the hypothesis that the proportion of clear days has dropped. (a) State the hypotheses and the conclusion reached, showing your working clearly. (b) If the proportion of clear days has in fact fallen to 68%, calculate the probability the student would have made a type II error [13 marks] 5. In a competition, the scores of four competitors were 73, x, y, 80. A 90% confidence interval of [73.3, 80.7] is calculated for the mean score in the competition, using the sample of the four competitors. (a) Find the value of x , the mean weight of the sample, and hence express y in terms of x. (b) Show that sn 1 , the estimate of the population standard deviation, is 3.14 (c) Use your answers to parts (a) and (b) to calculate the values of x, and y, where x < y. [14 marks] € Tony Halsey, 2007. Unregistered Copy Page 48 Answers Answers, and calculator displays have been given here - NOT solutions. Exercise 1 1. 0.3456 2. (a) 0.5244 (b) 0.00273 3. 0.984 4. Mean = 54.6. Standard Deviation = 5.86 5. (a) 0.344 (b) 0.142 (c) 0.277 (d) 0.0127 6. (a) (i) 0.331 (ii) 0.801 (b) 3.5 (c) (i) 35/99 (ii) 28/33 7. 20 8. 3/11 Exercise 2 1. 0.722 2. (a) 0.473 (b) 0.228 3. 0.273 4. 0.664 5. 0.129, 0.649 6. 0.292 Exercise 3 1. (a) 0.00296 (c) 0.810 2. (a) 0.0120 (b) 0.0620 (d) 0.676 (b) 0.166 (c) The sample size is too small and the population is not known to be normally distributed Exercise 4A 1. (a) H0: The drug cures 80% of patients, H1: The drug cures less than 80% of patients (b) 0.00234 (c) 0.2%<10% so we reject H0 in favour of H1. We conclude that less than 80% of patients are cured by the drug € Tony Halsey, 2007. Unregistered Copy Page 49 2. H0: 5% of items are defective, H1: More than 5% of items are defective p-value = 0.0524 5.24%>5% so there is insufficient evidence to reject H0 We accept that just 5% of items are defective 3. H0: 60% will vote yes, H1: < 60% will vote yes p-value = 0.00715 0.7%<10% so we reject H0 in favour of H1. We conclude that less than 60% will vote yes. 4. H0: mean age is 22, H1: mean age is <22 p-value = 0.103 10.3%>5% so there is insufficient evidence to reject H0 We conclude that the mean age is 22. 5. H0: mean waiting time is 3.2s, H1: mean waiting time is >3.2s p-value = 0.242 24.2%>10% so there is insufficient evidence to reject H0. We conclude that the mean waiting time is 3.2s 6. H0: mean age is 22, H1: mean age is <22 p-value = 0.124. 12.4%>5% so there is insufficient evidence to reject H0. We conclude that the mean age is 22. 7. H0: mean waiting time is 3.2s, H1: mean waiting time is >3.2s p-value = 0.0424. 4.24%<10% so we reject H0 in favour of H1. We conclude that the mean waiting time is more than 3.2s 8. H0: the coin is fair, H1: the coin is biased p-value = 0.258. 25.8%>1% so there is insufficient evidence to reject H0. We conclude that the coin is fair. 9. H0: the mean is 454g, H1: the mean is not 454g p-value = 0.0193 1.93%<10% so we reject H0 in favour of H1. We conclude that the machine is not working properly € Tony Halsey, 2007. Unregistered Copy Page 50 Exercise 4B 1. (a) 194g (b) 0.0151 2. (a) H0: 50% of downloads are broken H1: < 50% of downloads are broken p-value = 0.127. 12.7%>5% so there is insufficient evidence to reject H0. We conclude that 50% of downloads are broken (b) 0.0607 (using CLT approximation) A type II error is accepting H0, that 50% of students have had their calculator stolen, when, in fact, it is less than 50% of students 3. (a) 15.4 (b) 14.0 4. (a) H0: mean size of litter is 18.3, H1: mean size of litter is < 18.3 p-value = 0.244 24.4%>10% so there is insufficient evidence to reject H0. We conclude that the mean size of litter is 18.3 (b) Type II 5. 0.506 (using CLT approximation) Exercise 5A 1. [0.603, 0.757] 2. [0.190, 0.810] Not very useful as sample size too small 3. [1.96, 4.20] 4. [4.38, 5.22] 5. [16.6, 17.4] Exercise 5B 1. x 3.34, S n1 1.09 € Tony Halsey, 2007. Unregistered Copy Page 51 2. A 97.8% degree of confidence 3. A 65.5% degree of confidence 4. S n1 491 5. 2 213 6. (a) 9.53 (b) 0.847 Exercise 6 1. H0: The coins are all fair, H1: The coins are not all fair p-value = 0.0925 9.25%>5% so there is insufficient evidence to reject H0. We accept that the coins are all fair. 2. H0: The urn contains equal number of marbles H1: The urn contains different number of marbles p-value = 0.00000985 0.00000985 < 0.1 so we reject H0 in favour of H1. We conclude that the urn contains different number of each colour of marble. 3. H0: The number of accidents follows P0(0.9) H1: The number of accidents does not follow P0(0.9) p-value = 0.467 0.467 > 0.05 so there is insufficient evidence to reject H0. We accept that the number of accidents follows P0(0.9) 4. H0: The estimate follows N(72, 2) H1: The estimate doesn’t follow N(72, 2) p-value = 0.278 0.278 > 0.01 so there is insufficient evidence to reject H0. We accept that the estimate follows N(72, 2) € Tony Halsey, 2007. Unregistered Copy Page 52 5. H0: The number of voles follows Hyp(5,5,37) H1: The number of voles doesn’t follow Hyp(5,5,37) p-value = 5.5 10 11 p < 0.01 so we reject H0 in favour of H1. The number of voles doesn’t follow Hyp(5,5,37) 6. H0: The number of throws follows NB(3,1/6) H1: The number of throws doesn’t follow NB(3,1/6) p-value = 0.999 p > 0.05 so there is insufficient evidence to reject H0. The number of throws follows NB(3,1/6), or the coin is fair. Exercise 7 1. (a) H0: The foals weight is independent of the mare H1: The foals weight is dependent on the mare (b) 4.12 (c) 3.84 (d) p-value = 0.0423, 0.0423 < 0.05 so we reject H0 in favour of H1 We conclude that the foals weight is dependent on the mare’s weight 2. (a) H0: The results are independent of the hospital H1: The results differ according to hospital (b) a = 19.1, b = 10.8, c = 20.1 (c) (5-1)(3-1) = 8 (d) 6.63 (e) No, 56.7 > 6.63 so we reject H0. (or p-value < 0.01 so we reject H0.) 3. H0: The grade is independent of the time studying H1: The depends on the time studying p-value = 0.164 0.164 > 0.01 so there is insufficient evidence to reject H0. We accept that the grade is independent of the time spent studying (which, of course, is not true!) € Tony Halsey, 2007. Unregistered Copy Page 53 4. H0: The percentage of pocket money spent on clothes is independent of gender H1: The percentage of pocket money spent on clothes depends on the gender p-value = 0.223 22.3% > 5% so there is insufficient evidence to reject H0. We accept that spending is independednt of gender. 1. H0: IB results are independent of the school H1: IB results are dependent on the school p-value = 0.528 As 41.9%>10%, there is insufficient evidence to reject H0. We accept that the IB results are independent of the school € Tony Halsey, 2007. Unregistered Copy Page 54 Markscheme for Practice Paper 1 1. H0: 85% of people favour a subsidy of public transport H1: Either more than 85% or less than 85% of people favour a subsidy A1 A1 EITHER p-value = 0.0603 A4 6.03% > 5% so there is insufficient evidence to reject H0. A1R1 We accept that 85% of people favour a subsidy of public transport OR X is the number in favour X ~ B 20, 0.85 M1 p X 14 0.0673 M1A1 so for a -tailed test, p-value = 0.135 13.5% > 5% so there is insufficient evidence to reject H0. We accept that 85% of people favour a subsidy of public transport 15 25 7.21 2. (a) 52 (b) 4.61 (c) H0: Choice of sport is independent of gender H1: Choice of sport depends on gender A1 A1R1 A1 A3 A1 A1 EITHER p-value = 0.0301 A2 3.01% < 10% so we reject H0 in favour of H1. A1R1 We conclude that the choice of sport depends on gender OR 2calc = 7.00 A2 7.00 > 4.61 so we reject H0 in favour of H1. A1R1 We conclude that the choice of sport depends on gender € Tony Halsey, 2007. Unregistered Copy Page 55 3. (a) T B1 B2 B3 C 1 C 2 where Bk ~ N 332,222 and C k ~ N 507, 312 M1 E T 3 332 2 507 2010 M1 Var T 3 222 2 312 3374 M1 so T ~ N 2010, 3374 A1 p T 2000 0.568 M1A1 (b) D B1 B2 B3 C 1 C 2 M1 E D 3 332 2 507 18 M1 Var D 3 222 2 312 3374 M1 so D ~ N 18, 3374 A1 M1A1 p D 0 0.378 4. H0: Number of fish caught follows Po() A1 H1: Number of fish caught doesn’t follow Po() A1 Estimate of is the mean of the sample = 3.28 A1 Grouping the last three classes we obtain M2A2 Number of fish caught 0 1 2 3 4 5 6 7 Observed frequency 9 22 39 46 35 28 14 7 Expected frequency 7.53 24.7 40.5 44.3 36.3 23.8 13 9.93 2calc = 2.43 M1A1 2 p calc 2.43 0.932 M1A1 93% > 5% so there is insufficient evidence to reject H0. A1R1 We accept that the data follows a Poisson distribution € Tony Halsey, 2007. Unregistered Copy Page 56 5. (a) 490.6 , sn 1 8.44 A1A1 A1 t 2.132 8.44 8.44 Confidence interval is 490.6 2.132 , 490.6 2.132 5 5 482.6, 498.6 (b) (i) It will be M1A1 482.01 500 as a 90% CI has 5% either side so a 95% 1-tail test will have 8.44 5 the same value. R1 (ii) H0: Mean weight is 500g H1: Mean weight is less than 500g A1 A1 EITHER p-value = 0.0338 A3 3.38% < 5% so we reject H0 in favour of H1 R1 We conclude that the mean weight is less than 500g OR Test statistic = 490.6 500 2.49 8.44 5 -2.49 < -2.132 so we reject H0 in favour of H1 M1A2 R1 We conclude that the mean weight is less than 500g (c) Mean is 490.6. 8.89 z 5 and z 1.960 so A1 M1 A1 8.89 5 10.1 1.960 M1A1 € Tony Halsey, 2007. Unregistered Copy Page 57 Markscheme for Practice Paper 2 1. (a) All expected frequencies of faulty disks are 38/4 = 9.5 A1 All expected frequencies of non-faulty disks are 1000 - 9.5 = 990.5 A1 (b) H0: The number of faulty disks is independent of the manufacturer A1 H1: The number of faulty disks differs according to manufacturer A1 EITHER p-value = 0.04238 A3 4.2% < 5% so we reject H0 in favour of H1, the number of faulty disks differs according to manufacturer. A1R1 OR 2calc = 8.183 with 3 degrees of freedom A2 Critical value = 7.814 A1 8.183 > 7.814 so we reject H0 in favour of H1, the number of faulty disks differs according to manufacturer A1R1 2. H0: The time between visits follows an exponential distribution with parameter 0.2 A1 H1: The time between visits does not follow an exponential distribution with parameter 0.2 A1 Expected frequencies are Time between visits 0-1 1-2 2-3 3-5 5-10 >10 ExpectedFrequency 18.13 14.84 12.15 18.09 23.25 13.53 M2A 2calc = 4.811 with 4 degrees of freedom A2 EITHER p-value = 0.4393 M1A1 44% > 10% so there is insufficient evidence to reject H0. We accept that the time between visits follows an exponential distribution with parameter 0.2 A1R1 OR Critical value = 7.779 M1A1 4.811 < 7.779 so there is insufficient evidence to reject H0. We accept that the time between visits follows an exponential distribution with parameter 0.2 € Tony Halsey, 2007. Unregistered Copy A1R1 Page 58 3. R ~ N 4.1, 0.82 , F ~ N 3.8,1.62 (a) If the overall change in level is W, then W R F M1 M1 E W 4.1 3.8 0.3 Var W 0.82 1.62 3.2 M1A1 So W ~ N 0.3, 3.2 M1 p W 0 0.567 A1 (b) If the overall change in level is T, then T R1 .. R4 F1 ... F4 M1 E T 4 4.1 4 3.8 1.2 M1 Var W 4 0.82 4 1.62 12.8 M1 So T ~ N 1.2,12.8 M1 M1A1 p T 5 0.0415 4. (a) H0: The proportion of clear skies is 73% A1 H1: The proportion of clear skies is less than 73% A1 EITHER p-value = 0.0204 A3 2% > 1% so there is insufficient evidence to reject H0. We accept that the proportion of clear skies is 73% A1R1 OR X is the number of clear days, X ~ B 84, 0.73 M1 M1A1 p X 53 0.0300 3% > 1% so there is insufficient evidence to reject H0. We accept that the proportion of clear skies is 73% A1R1 0.73 0.27 0.73 0.6173 84 0.68 0.32 p ~ N 0.68, If p 0.68 then 84 p 1 0.01 (b) For accepting H0, p p 0.6173 0.891 M2A1 M1 M1A1 € Tony Halsey, 2007. Unregistered Copy Page 59 5. (a) 73.3 80.7 77 2 so x y 153 77 4 A1 y 155 x s (b) t n 1 3.7 n M1A1 M2 A1 t 2.353 sn 1 3.7 4 3.14 2.353 M1A1 732 802 x 2 y 2 4 77 2 3.142 (c) 4 3 M1A1 x 2 y 2 12016.5788 M1 x 2 155 x 2 12016.5788 M1 2 2x 310x 12008 0 x 76 , y 79 A2 € Tony Halsey, 2007. Unregistered Copy Page 60 Probability Distributions Notation PDF p(X x ) Range Calculator Mean Variance CDF n 1 2 n2 1 12 x n np npq binomcdf(n,p,x) Discrete Uniform Discrete 1 n UD(n) x 1, 2, 3,....., n Equally likely scores. Eg the score when a dice is thrown. Binomial n B(n,p) C x p x q n x x 0,1, 2,...., n binompdf(n,p,x) Number of successes in n trials, when the trials are independent and the probability of success is p. Eg the number of sixes in 10 throws of a dice. x 1, 2, 3,.... pq x 1 Geo(p) geometpdf(p,x) q p2 1 p geometcdf(p,x) The amount of trials before a success, when the probability of each success is p. Eg the amount of throws of a dice needed before a six is thrown. Poisson m xe m x! Po(m) x 0,1, 2, 3,.... poissonpdf(m,x) m m poissoncdf(p,x) Number of successes in an unknown number of trials, when the mean is known. Eg the number of fishes caught in a lake. np where Hypergeometric Hyp( n,M,N) M C x N M C n x N Cn x 0,1, 2,...., n M p N N n npq N 1 Number of “successes” when taking a sample of size n, without replacement,, from a set of size N, M of which are “sucesses” Negative Binomial NB(r,p) x 1 C r 1 p rq x r r p x r , r 1,..... rq p2 The amount of trials before r successes, when the probability of each success is p. Eg the amount of throws of a dice needed before 5 sixes have been thrown. Continuous Uniform Continuous U(a,b) 1 b a a x b 2 a b 2 b a 2 1 1 2 12 Equally likely scores. Eg a random real number between 5 and 8. Normal N(,) 1 e x 2 2 /2 x normalcdf(a,b,,) normalcdf(0,b,,) Standard “bell shaped curve”. Eg the lengths of worms Exponential Exp() e x 0x Typically the amount of time before a success. Eg the time before a customer comes into a shop. (Continuous form of Geometric distibution) 1 e x Appendix A - Table of probability distributions Geometric Appendix B - Further Notes on the use of TI calculators The use of probability distributions can sometimes be simplified by setting up the SOLVER to evaluate any variable. For example, if working with the Normal distribution we can set up the solver Then by entering the known variables, we can solve for the unknown variable. For example in the example on page 8, we can enter L, U, M and S to solve for P. or in the example on page 19, we can enter L, U, M and P to solve for U. [note that S can be entered directly as ] Similarly we can set up the solver to solve for any variable for the Student-t or 2 distributions TI84 calculators with an updated operating system (any TI84 calculator can have it’s operating system updated through TI Connect), not only have the c2-GOF test added but also the inverse to the Student-t distribution. This can simplify working so that solver is not needed for examples like those on page 28 The use of the APP CtlgHelp is allowed and strongly recommended. It gives a help system so that the order of the parameters does not need remembering for each distribution. For example € Tony Halsey, 2007. Unregistered Copy Page 62