Download Confidence Interval

Distribution of total and sample mean Sample Statistics & Data display We can calculate statistics form a sample. These reflect what is happening in the population as a whole. The statistics in the sample reflect the parameters in the population Notation Mean Standard deviation Variance Population    2 Parameters Sample x s s 2 Statistics Example 25 people in a lift. They have a mean weight of 65kg and a SD of 7kg. Find the mean and SD of the load E (T )  n  25  65  1625kg SD(T )  n  25  7  5 7  35kg If we repeat an experiment a certain number of times, then T is the sum of n independent random variables. E (T )  n VAR(T )  n 2 SD(T )  n A fruit and vegetable market accepts deliveries of crates of apples. Each crate has a weight that is normally distributed with a mean of 21kg and a standard deviation of 0.4 kg. The crates are delivered in groups of 18 on pallets that weigh exactly 30kg. a) Calculate the mean total weight of a pallet with 18 crates of apples b) Calculate the standard deviation of the total weight of a pallet with 18 crates of apples. Central Limit Theorem Consider a sample size n from a population X with a mean of µ and a SD of σ sample mean x =µ Sample standard deviation s = Variance s2= σ2 n  n  n is sometimes called the standard error of the sample mean x If n is large (>30) then the distribution of the sample means will be approximately a normal distribution The Central Limit Theorem states that values of the sample means could be expected to average out to the population mean. There is a certain amount of spread about the mean. This is the standard error or standard deviation of the sample mean Example A sample of size 20 is taken from a box of beans. The mean length of the beans in the box is 19 cm with a SD of 2.5cm. a) What would the expected value of the sample be? b) What would the variance of the sample be? c) What would the standard error of the sample be? a) The expected value E(X)=μ  19cm b) The variance =  2 n (2.5) 2  20  0.3125 c) The standard error is the standard deviation =   n 2.5 20  0.559cm We need to know the difference between the mean, variance and standard deviation, of the population, total of n values and the sample Summary Table from p. 182 Random Variable Mean Variance X  2  T n n 2 n  2  Population Total of n values Sample mean X n Standard Deviation n Probabilities for the total When we deal with the sum of a few variables We use: E (T )  n VAR(T )  n 2 SD(T )  n the sum will be within a certain given range The distribution of the sum is normal, it will be shaped like the bell curve Lower Upper   The probability is the area under the curve Example A sample of 16 items is taken from a population X with a mean µ=34, and a SD σ=4 Calculate the probability that the total T of 16 items is below 530 n  16   34  4   34  16  544 SD  n  16  4  16 P  0.80921  0.8092(4dp) 530 544 Lower: -1Exp99 Upper: 530 = 16  = 544 Example A lift is licensed to carry a maximum of 25 passengers. It is overloaded when the total passenger loads exceeds 1700kg. The weight of single passengers chosen at random have a mean of 65kg and a standard deviation of 7kg. Calculate the probability that the lift is overloaded, assuming the lift is carrying 25 passengers. Probabilities for the sample mean Sometimes we need to know the probability of where the sample mean is likely to be in relation to the population mean. The sample mean is likely to have a smaller spread as the standard deviation will be smaller For this we use: VAR ( X )  SD ( X )  2 n  n Probability for samples A sample size of 36 is taken from a normally distributed population with a mean of 40 and a standard deviation of 12. Calculate the probability that the sample mean is a) Less then 41 b) Between 37 and 42 Confidence Intervals Remember we calculate statistics from a sample to estimate the parameters of the population Each sample mean will be slightly different for every other sample mean, so it is better to give an interval that we will be confident that the sample mean is within. This is our degree of confidence. The spread of the values that the sample means take gives an idea of how accurate the estimate is. This is called the confidence interval. The spread on either side of the mean, the standard deviation of the mean is called the standard error Using the calculator to find confidence intervals Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38 0.475 0.475 0.5 0 95% Confidence interval between these boundaries Calculator only measures from the far left For the calculator Area  0.5  0.475  0.975 We can use the calculator to find Z the number of SDs Calculating the Sample Size If we want to have a certain confidence level that the sample mean of a sample we are going to take, will lie with in given boundaries. The margin of error is the distance between one of the end points of the interval and the sample mean Margin of error e Margin of error e e=z ×  e=z × n µ Eg For 30m<µ<34m, the confidence interval is 32±2m The margin of error is 2m  n A certain make of scientific calculator is known to have a voltage rating with a standard deviation of 0.05v. The mean voltage of 40 of these calculators is 3.02V. a. Construct a 90% confidence interval for the average voltage. b. Explain the meaning of this confidence interval Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38 0.475 95% 0.475 0.5 26.58 28.3 30.02 Confidence interval between these values From the calculator Z  1.96 z z X  n n (1.96)(4.38) (1.96)(4.38) 28.3     28.3  25 25 X 28.3  1.717    28.3  1.717 26.58    30.02 Using the Calculator to check your answer Eg #1 Ex14.1 Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38 In Stats mode 1  s F1 Enter values Sample with one mean EXE F4 intr F2 Var Z F1 26.58    30.02 WB Eg 11 The time taken for an individual to walk to work is to be estimated. On 15 occasions the time in minutes were, 18, 17, 15, 20, 16, 14, 19, 13, 17, 16, 14, 15, 20, 18, 19 a) Find the sample mean and SD b) Assuming normal distribution and that the sample is sufficiently large, calculate a 95% confidence interval for the mean time to walk to work. Use the calculator to answer a)   2.17(2dp) z z X X  n n (1.96)(2.17) (1.96)(2.17) 16.73     16.73  15 15 x  16.73(2dp), 16.73  1.10    16.73  1.10 15.6    17.8minutes 95% 0.475+0.5 =0.975 Z=1.96 0.475 Interpreting Confidence Intervals x  16.73(2dp),   2.17(2dp) 15.6    17.8minutes 15.6 16.73 17.8 There is a 95% probability that the interval 15.6-17.8 contains the true mean. Ex P75 4.01 Confidence Interval for Proportions Confidence Intervals for Proportions Another parameter of the population is the population proportion p or π. This is the probability of success over a large number of trials, which should be similar to the proportion of successes in the population as a whole The best estimate of the proportion of success for the population is the sample p p x p n x  successes n  number of trials E ( X )  np E ( X )  estimated value of X X, the random variable for the number of successes in the sample has a approximately a normal distribution. Example A random sample of 80 households showed that 30% owned PCs. Construct a 95% confidence interval for p, the percentage of households that own a PC There is a 95% probability that the interval 19.96%-40.04% contains the true population proportion. (There is 95% probability that the interval 19.96%-40.04% contains the proportion of households that own PCs.) In a sample of 210 people with high blood pressure a particular drug is found to be effective for 150 of them. Construct a 95% confidence interval for P the proportion of all patients who use this particular drug for high blood pressure 150 p 210 p  Z 2 1  0.71429 60 q 210  0.28571 z  1.96 pq pq 1  p  p  Z n n 2 (0.71429)(0.28571) (0.71429)(0.28571) 0.71429  (1.96)  p  0.71429  (1.96) 210 210 0.653  p  0.775 65.3%  p  77.5% The main purpose of a recent survey was to estimate the proportion of all adult NZers who are opposed to tipping for service in restaurants. The survey used a random sample of 663 adult New Zealanders, of whom 292 indicated that they are opposed to tipping for service. a) State clearly the parameter of interest in this survey (A) b) Calculate a 90% confidence interval for the proportion of all adult NZers who oppose tipping.(A) c) Analyse the effect of increasing the number of adults surveyed on the width of this confidence interval. (E) d) Suppose 50 independent random samples of adult NZers are taken and 90% confidence interval is constructed from the results of each sample. Analyse the phrase “90% confidence" by making reference to these 50 confidence intervals. (E) There is 90% probability that the true population proportion lies within the confidence Interval of any one of the 50 random samples. That is 45 out of 50 confidence intervals contains the true population proportion. • • • 1) Motel occupancy rates for July 1997 from a random sample of 35 motels gave the following statistics: Sample size 35 Sample mean 0.572 Sample standard deviation:0.065 Calculate a 95% confidence interval for the mean occupancy rate for July 1997 for the population sampled. (A) 2) What would be the effect of increasing the level of confidence on the width of this confidence interval? (M) 3) The mean occupancy rate for the same population for July 1996 is 0.585. It is claimed that the mean occupancy rate for July 1997 is the same as the mean occupancy rate for July 1996. Using the confidence interval calculated in (a) at the 95% level of confidence, demonstrate whether the random sample gives us evidence against this claim. (M) 4) Calculate the number of motels needed to be sampled if the mean occupancy rate for July 1997 was to have been estimated to within 0.015 of its true value at the 95% level of confidence. (M) Confidence interval for the difference between two means Confidence interval for the difference between two means If two populations are the similar then we would expect the difference between their two means to be about zero. If the populations are different then we would expect the means to be different. So if two populations are different, the confidence interval of the difference between their means must not contain 0. We use x1  x2 to estimate 1  2 SD Sample size Sample mean Notation mean Population 1 1 1 n1 x1 Population 2  2 n2 x2 E ( D)  E ( X 1  X 2 )  E ( X1 )  E ( X 2 )  1  2 Confidence Interval (x1  x2 )  Z  1  1 n1 2   22 n2 On formula sheet VAR ( D)  VAR ( X 1  X 2 )  VAR ( X 1 )  VAR ( X 2 )  SD( D)   1 n1   1 n1  22  n2  22 n2  X X  1 2  1  22 n1  n2 Example A random sample of 30 objects is taken from a normally distributed population with a SD of 6, another sample of 50 objects is taken from a population with a SD of 8. The mean of the first sample is 115, and that of the second is 108. 1) Construct a 96% confidence interval for µ1- µ2. 2) Explain whether its likely that the two groups have the same mean. 3.77  1  2  10.23 Is the 96% confidence interval for the difference between the two means. The interval does not contain 0, so it is not likely that the two means are equal. We can say this with at least 96% confidence. Students are told to measure the area of the classroom, they provide estimates which are approximately normally distributed with SD=0.15m2. 31 students measured one classroom obtained a mean of 29.76m2 , while 26 students measured another classroom and obtained a mean of 31.23m2. What is the 95% confidence interval for the amount by which the area of the second classroom exceeds that of the first. 1.392  2  1  1.548 This is the 95% confidence interval for the amount by which the area of the second classroom exceeds that of the first. We are 95% confident that the area of the second exceeds that of the first as zero is not in the confidence interval Interpretation If the confidence interval includes zero then we cannot say that there is a difference between the two samples If zero is not included then we are confident that there is a difference between the two samples We need to make the assumptions that the samples are large enough and that they are independently selected and that the population they are selected from is normally distributed a< μ2– μ1 <b • If both a and b are positive, it is reasonable to assume that μ2 is larger than μ1 by between a and b units. It’s unlikely two means are the same. • If both a and b are negative, it is reasonable to assume that μ2 is smaller than μ1 by between -a and -b units. It’s unlikely two means are the same. • If a and b have opposite signs, it is reasonable to assume that μ2 is smaller than μ1 by –a or μ2 is larger than μ1 by b units or somewhere in between. This includes the possibility that the two means are equal. True or false A 99% confidence interval for the difference between two means is calculated from sample data. -3.5< μ2– μ1 <9.4. a. There is a 99% probability that the means are equal because the interval includes 0. b. 99% of intervals calculated in the same way will include the difference of the two means. Below is a random sample of times for both male and female competitors to complete the annual Mountain Biking Race. Sample size Mean Standard deviation Male 30 57min 10min female 30 65min 14min a) b) Calculate a 95% confidence interval for the difference between the mean time for males to complete the race and the mean time for females to complete the race. In last year’s race, a similar 95% confidence interval for the difference between µmand µf was calculated and found to be -6.25< µm - µf <1.36. Based on this confidence interval, demonstrate whether there is a significant difference between the mean race times for males and females. 0 is in the 95% interval (-6.25< µm - µf <1.36) so it can be concluded that there is no significant difference between the mean race time for males and females. Below is the summary stats for the length of the snapper surveyed in each region are shown in the table below. Sample size Sample mean Sample standard deviation Reserve 897 360.18 94.48 Non-reserve’ 47 257.09 59.35 a) Calculate a 95% confidence interval for the difference between the mean length of snapper in the reserve and the non-reserve regions. b) It is claimed that the ‘average snapper’ in the reserve is at least 130mm longer than the “average snapper” in the non-reserve region. Use the 95% confidence interval from a to analyse the validity of this claim. 95% of the confidence interval between 85.03 and 121.15 contains the difference between the non-reserve and reserve snappar. 130 mm is not in this interval and so one can be 95% sure that this claim is invalid. Interpretation of Confidence Intervals The company produces two different models of batteries. ‘power’ and ‘super’. 95 people were interviewed who have used both ‘power’ and ‘super’ batteries, to find out which of the two models these people prefer to use in their torches. Of the 95 people, 63 said that they prefer to use the ‘power’ model in their torches. a) Find a 95% confidence interval for the proportion of all people who have used both ‘power’ and ‘super’ batteries and prefer to use the ‘power’ model of battery in their torches. 0.568<π<0.758 b) Write a clear description that gives the meaning of this confidence interval. 95% of the confidence intervals from 0.568 and 0.758 contain the true proportion of people who prefer the ‘power’ model. Calculating Sample Size If we are given a particular level of confidence we can calculate the sample size (n) to give the required margin of error (e)  e=z × n 95% confidence interval, σ=4, margin of error e=2 How big is the sample size? first we need to find Z the number of SDs 95%  n  1.96   n  3.92  2  1.96  4 n  n  15.366  n  16 4 2 0.975 for calc Z=1.96 0.475 A random variable is known to have a standard deviation of 14. What sample size would be required to be 90% confident that an estimate of the mean was within 2 units of its true value 0.95 for the calc 0.45 e=z ×  n 2  1.6448  14 n 14 n  1.6448  2 n  11.5136 n  132.56 n  133 Z  1.6448 Calculate sample size for proportion A pilot survey from a few tax returns has shown that approximately 12% of all taxpayers are in ‘high-income’ category. If the Inland Revenue Department wishes to estimate this percentage to within 1%, with 96% confidence, how many tax returns should it sample? Calculating sample size for proportion A market research company wishes to estimate the percentage of people in a certain age bracket who read a current-affairs magazine. The degree of confidence required for this estimate is 90%. What sample size should be taken to estimate the percentage to within 4%. e  4%  0.04 p unstated so use 0.5   90%  0.9 p=0.5  q  0.5 For Calculator e  Z 0.5  1.645 2 pq n It is easier to rearrange the formula first 0.5  0.45  0.95 Z  1.6448 e  Z  1 0.45 p pq n e2 pq  2 Z n pqZ 2 n 2 e (0.5)(0.5)(1.645) 2 n (0.04) 2 n  422.8  n  423 minimum sample size is 423 Calculating Sample Size What size of sample should be taken from a population of packets of butter, when the standard deviation of the weights of packets is 4 g, if the mean weight is to be estimated to within 0.5 g with 95% accuracy. 1)Use inverse norm 2) n= σ z e to find out Z value σ=1 μ=0 Sample size for population proportion Radio Sport wishes to conduct an opinion poll on whether the captain of the New Zealand netball team should be replaced. The degree of confidence required for this poll is 95%. What size sample should be used to obtain the percentage to within 5% accuracy? pq z2 1)Use inverse norm 2) n= to find out Z value e2 σ=1 μ=0 Sample size for proportion and sample mean • An opinion poll with a level of confidence of 95% and an estimated value of p of 0.5 has a margin of error of 4%. How many people would have taken part in the poll? • A sample of containers of car parts has a mean weight of 40kg and a standard deviation of 5 kg. How many containers would need to have been in the sample to ensure at the 95% level of confidence that the sample was within 0.5kg of the population? Confidence Interval Revision • • • • • Sample mean  μ Sample proportion  p Difference of Means μ1- μ2 Margin of error is Half of the confidence interval Sample size for sample mean: n= σz 2 e • Sample size for sample proportion n= pqz2 e2 • Sample size for Difference of means when two σ and n are the same n= 2 σ2z2 e2 Meaning of confidence interval • Mean (99%) 99% of such interval include the population mean. • Proportion (99%) 99% of such interval include the population proportion. • Difference of means (99%) 99% of such interval include the difference of the two population mean. • Confidence interval for difference of mean If 0 is included in the confidence interval, no difference between the two means are suggested. If 0 is not included in the confidence interval, a difference of the two means are suggested. Confidence Interval Revision • Mean A sample of 120 wire cables is tested. The mean breaking strain was found to be 5.4 tonnes with a standard deviation of 1.3 tonnes. Calculate a 95% confidence interval for the breaking strain for this type of wire cable. • Proportion A sample opinion poll of 200 students is taken and 130 students are found to support the idea of extending opening hours of the library. Calculate a 99% confidence interval for the proportion of all students in the school in favour of extending the library hours. • Difference between two means A sample of 150 Longlife batteries showed a mean capability of 140 photos and a standard deviation of 12 photos. A sample of 200 Lastshot batteries showed a mean capability of 120 photos and a std devation of 8 photos. Find 95% confidence interval for the difference in the mean life time between the two brands of batteries. Sample size (use solver) • The owner of a camera shop knows that 65% of the customers return to his store. How large a sample would the shop owner have to take to be 95% confident that the sample proportion is within 5% of the true value? • What size of sample should be taken from a population of packets of butter, when the standard deviation of the weights of packets is 4 g, if the mean weight is to be estimated to within 0.5 g with 95% accuracy. We need to know the difference between T=X1+X2 and Y=2X T is the sum of two random variables, which can take different values. T  X1  X 2 E(T)=E(X1 )  E(X 2 )   2 VAR (T )  VAR ( X 1 )  VAR ( X 2 )  2  2  2 2  SD (T )  2 Y can represent the outcome of X multiplied by 2 Y  2X E (Y )  E (2 X )  2E ( X )  2 VAR (Y )  VAR (2 X )  22 VAR ( X )  4 2 SD(Y )  2 ie 2 identical Normal Distribution 68% of the data is within 1 standard deviation either side of the mean Data is likely to be in this region 95% of the data is within 2 standard deviations either side of the mean Data is very likely to be in this region 99% of the data is within 3 standard deviations either side of the mean Data is almost certain to be in this region T is the sum of n independent random variables with might take Different values. T  X 1  X 2  X 3  ...........  X n E (T )  n VAR (T )  n 2 SD(T )  n T is the outcome of the same variable multiplied by n. T = nX E(T)=nμ VAR(T)=n2σ2 25 people in a lift. They have a mean weight of 65kg and a SD of 7kg. Find the mean and SD of the load The apples in the baskets have a mean weight of 1.2g each. And a SD of 0.3g each. Find the mean and SD of a basket of 20 apples. The mean petrol usage for a car is 7 litre per day. Standard deviation is 0.3 litre. The cost for petrol is $1.96 per litre. What’s the mean and SD of the cost of petrol per day? 1 kg of apple costs $1.2. A basket of apple produced from ABC factory has a mean weight of 2.5kg and a SD of 3 kg. What’s the cost of one basket of apples?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Confidence Interval